Micro-Architectural features as soft-error markers in embedded safety-critical systems: preliminary study
D Kasap, A Carpegna, A Savino… - 2023 IEEE European …, 2023 - ieeexplore.ieee.org
Radiation-induced soft errors are one of the most challenging issues in Safety Critical Real-
Time Embedded System (SACRES) reliability, usually handled using different flavors of …
Time Embedded System (SACRES) reliability, usually handled using different flavors of …
Reproducibility, Replicability, and Repeatability: A survey of reproducible research with a focus on high performance computing
BA Antunes, DRC Hill - arXiv preprint arXiv:2402.07530, 2024 - arxiv.org
Reproducibility is widely acknowledged as a fundamental principle in scientific research.
Currently, the scientific community grapples with numerous challenges associated with …
Currently, the scientific community grapples with numerous challenges associated with …
Vulnerability analysis of instructions for SDC-causing error detection
J Gu, W Zheng, Y Zhuang, Q Zhang - IEEE Access, 2019 - ieeexplore.ieee.org
Due to the centralization of communication in the management of data generated by diverse
Internet of Thing (IoT) devices, there is a lack of reliability when data is being transferred and …
Internet of Thing (IoT) devices, there is a lack of reliability when data is being transferred and …
Multi-bit data flow error detection method based on SDC vulnerability analysis
Z Yan, Y Zhuang, W Zheng, J Gu - ACM Transactions on Embedded …, 2023 - dl.acm.org
One of the most difficult data flow errors to detect caused by single-event upsets in space
radiation is the Silent Data Corruption (SDC). To solve the problem of multi-bit upsets …
radiation is the Silent Data Corruption (SDC). To solve the problem of multi-bit upsets …
Response of HPC hardware to neutron radiation at the dawn of exascale
A Bustos, AJ Rubio-Montero, R Méndez… - The Journal of …, 2023 - Springer
Every computation presents a small chance that an unexpected phenomenon ruins or
modifies its output. Computers are prone to errors that, although may be very unlikely, are …
modifies its output. Computers are prone to errors that, although may be very unlikely, are …
Towards end-to-end sdc detection for hpc applications equipped with lossy compression
Data reduction techniques have been widely demanded and used by large-scale high
performance computing (HPC) applications because of vast volumes of data to be produced …
performance computing (HPC) applications because of vast volumes of data to be produced …
Anomaly detection in scientific datasets using sparse representation
A Moon, M Kim, J Chen, SW Son - Proceedings of the First Workshop on …, 2023 - dl.acm.org
As the size and complexity of high-performance computing (HPC) systems keep growing,
scientists' ability to trust the data produced is paramount due to potential data corruption for …
scientists' ability to trust the data produced is paramount due to potential data corruption for …
Exploring the effects of silent data corruption in distributed deep learning training
The profound impact of recent developments in artificial intelligence is unquestionable. The
applications of deep learning models are everywhere, from advanced natural language …
applications of deep learning models are everywhere, from advanced natural language …
A characterization of soft-error sensitivity in data-parallel and model-parallel distributed deep learning
The latest advances in artificial intelligence deep learning models are unprecedented. A
wide spectrum of application areas is now thriving thanks to available massive training …
wide spectrum of application areas is now thriving thanks to available massive training …
Silent data corruption estimation and mitigation without fault injection
Silent data corruptions (SDCs) have been always regarded as the serious effect of radiation-
induced faults. Traditional solutions based on redundancies are very expensive in terms of …
induced faults. Traditional solutions based on redundancies are very expensive in terms of …