Analyzing and increasing the reliability of convolutional neural networks on GPUs
FF dos Santos, PF Pimenta, C Lunardi… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
Graphics processing units (GPUs) are playing a critical role in convolutional neural networks
(CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments …
(CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments …
[图书][B] Architecture design for soft errors
S Mukherjee - 2011 - books.google.com
Architecture Design for Soft Errors provides a comprehensive description of the architectural
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …
Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions
P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …
engineering and is today used to classify or detect objects, a key feature in autonomous …
Low-cost program-level detectors for reducing silent data corruptions
With technology scaling, transient faults are becoming an increasing threat to hardware
reliability. Commodity systems must be made resilient to these in-field faults through very …
reliability. Commodity systems must be made resilient to these in-field faults through very …
Application-level correctness and its impact on fault tolerance
X Li, D Yeung - 2007 IEEE 13th International symposium on …, 2007 - ieeexplore.ieee.org
Traditionally, fault tolerance researchers have required architectural state to be numerically
perfect for program execution to be correct. However, in many programs, even if execution is …
perfect for program execution to be correct. However, in many programs, even if execution is …
Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators-Trends in Quantum Computing, Heterogeneous Systems and …
S Venkatesha, R Parthasarathi - ACM Computing Surveys, 2024 - dl.acm.org
Rapid progress in the CMOS technology for the past 25 years has increased the
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
Examining ACE analysis reliability estimates using fault-injection
NJ Wang, A Mahesri, SJ Patel - Proceedings of the 34th annual …, 2007 - dl.acm.org
ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE
analysis couples data from abstract performance models with low level design details to …
analysis couples data from abstract performance models with low level design details to …
Software-controlled fault tolerance
Traditional fault-tolerance techniques typically utilize resources ineffectively because they
cannot adapt to the changing reliability and performance demands of a system. This paper …
cannot adapt to the changing reliability and performance demands of a system. This paper …
nZDC: A compiler technique for near zero silent data corruption
M Didehban, A Shrivastava - Proceedings of the 53rd Annual Design …, 2016 - dl.acm.org
Exponentially growing rate of soft errors makes reliability a major concern in modern
processor design. Since software-oriented approaches offer flexible protection even in off …
processor design. Since software-oriented approaches offer flexible protection even in off …
Compiler-managed software-based redundant multi-threading for transient fault detection
As transistors become increasingly smaller and faster with tighter noise margins, modern
processors are becoming increasingly more susceptible to transient hardware faults …
processors are becoming increasingly more susceptible to transient hardware faults …