A survey on multithreading alternatives for soft error fault tolerance
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce
higher soft error rates. This trend makes reliability a primary design constraint for computer …
higher soft error rates. This trend makes reliability a primary design constraint for computer …
Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators-Trends in Quantum Computing, Heterogeneous Systems and …
S Venkatesha, R Parthasarathi - ACM Computing Surveys, 2024 - dl.acm.org
Rapid progress in the CMOS technology for the past 25 years has increased the
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
Resilience design patterns: A structured approach to resilience at extreme scale
S Hukerikar, C Engelmann - arXiv preprint arXiv:1708.07422, 2017 - arxiv.org
Reliability is a serious concern for future extreme-scale high-performance computing (HPC)
systems. While the HPC community has developed various resilience solutions, the solution …
systems. While the HPC community has developed various resilience solutions, the solution …
Expert: Effective and flexible error protection by redundant multithreading
Resiliency is a first-order design concern in modern microprocessor design. Compiler-level
Redundant MultiThreading (RMT) schemes are promising because of their capability to …
Redundant MultiThreading (RMT) schemes are promising because of their capability to …
Hybrid lockstep technique for soft error mitigation
M Peña-Fernández, A Serrano-Cases… - … on Nuclear Science, 2022 - ieeexplore.ieee.org
This work presents the evaluation of a new dual-core lockstep hybrid approach aimed to
improve the fault tolerance in microprocessors. Our approach takes advantage of modern …
improve the fault tolerance in microprocessors. Our approach takes advantage of modern …
EXPERTISE: An effective software-level redundant multithreading scheme against hardware faults
Error resilience is the primary design concern for safety-and mission-critical applications.
Redundant MultiThreading (RMT) is one of the most promising soft and hard error resilience …
Redundant MultiThreading (RMT) is one of the most promising soft and hard error resilience …
Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading
Redundant multithreading (RMT) is an effective reliability solution that provides thread-level
replication; however, it imposes additional overheads in terms of performance loss or energy …
replication; however, it imposes additional overheads in terms of performance loss or energy …
Resilience design patterns-a structured approach to resilience at extreme scale (version 1.0)
S Hukerikar, C Engelmann - arXiv preprint arXiv:1611.02717, 2016 - arxiv.org
In this document, we develop a structured approach to the management of HPC resilience
based on the concept of resilience-based design patterns. A design pattern is a general …
based on the concept of resilience-based design patterns. A design pattern is a general …
Regional soft error vulnerability and error propagation analysis for GPGPU applications
I Öz, ÖF Karadaş - The Journal of Supercomputing, 2022 - Springer
The wide use of GPUs for general-purpose computations as well as graphics programs
makes soft errors a critical concern. Evaluating the soft error vulnerability of GPGPU …
makes soft errors a critical concern. Evaluating the soft error vulnerability of GPGPU …
Efficient thread‐to‐core mapping alternatives for application‐level redundant multithreading
S Arslan, O Ünsal - Concurrency and Computation: Practice …, 2023 - Wiley Online Library
Redundant multithreading (RMT) is an effective thread‐level replication method to improve
the reliability requirements of applications. Although it significantly improves the robustness …
the reliability requirements of applications. Although it significantly improves the robustness …