Addressing failures in exascale computing

M Snir, RW Wisniewski, JA Abraham… - … Journal of High …, 2014 - journals.sagepub.com
We present here a report produced by a workshop on 'Addressing failures in exascale
computing'held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to …

Accurate microarchitecture-level fault modeling for studying hardware faults

ML Li, P Ramachandran, UR Karpuzcu… - 2009 IEEE 15th …, 2009 - ieeexplore.ieee.org
Decreasing hardware reliability is expected to impede the exploitation of increasing
integration projected by Moore's Law. There is much ongoing research on efficient fault …

mSWAT: Low-cost hardware fault detection and diagnosis for multicore systems

SK Sastry Hari, ML Li, P Ramachandran… - Proceedings of the …, 2009 - dl.acm.org
Continued technology scaling is resulting in systems with billions of devices. Unfortunately,
these devices are prone to failures from various sources, resulting in even commodity …

Characterizing the impact of intermittent hardware faults on programs

L Rashid, K Pattabiraman… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
Extreme complimentary metal-oxide-semiconductor (CMOS) technology scaling is causing
significant concerns in the reliability of computer systems. Intermittent hardware errors are …

Trace-based microarchitecture-level diagnosis of permanent hardware faults

ML Li, P Ramachandran, SK Sahoo… - … and Networks With …, 2008 - ieeexplore.ieee.org
As devices continue to scale, future shipped hardware will likely fail due to in-the-field
hardware faults. As traditional redundancy-based hardware reliability solutions that tackle …

The use of microprocessor trace infrastructures for radiation-induced fault diagnosis

M Peña-Fernandez, A Lindoso… - … on Nuclear Science, 2019 - ieeexplore.ieee.org
This work proposes a methodology to diagnose radiation-induced faults in a microprocessor
using the hardware trace infrastructure. The diagnosis capabilities of this approach are …

Hardware/software codesign architecture for online testing in chip multiprocessors

O Khan, S Kundu - IEEE Transactions on Dependable and …, 2011 - ieeexplore.ieee.org
As the semiconductor industry continues its relentless push for nano-CMOS technologies,
long-term device reliability and occurrence of hard errors have emerged as a major concern …

Microprocessor error diagnosis by trace monitoring under laser testing

M Peña-Fernández, A Lindoso… - … on Nuclear Science, 2021 - ieeexplore.ieee.org
This work explores the diagnosis capabilities of the enriched information provided by
microprocessors trace subsystem combined with laser fault injection. Laser fault injection …

Hardware fault recovery for i/o intensive applications

P Ramachandran, SKS Hari, M Li… - ACM Transactions on …, 2014 - dl.acm.org
With continued process scaling, the rate of hardware failures in commodity systems is
increasing. Because these commodity systems are highly sensitive to cost, traditional …

[PDF][PDF] Techniques for Increasing Security and Reliability of IP Cores Embedded in FPGA and ASIC Designs

D Ziener - 2010 - research.utwente.nl
The focus of this work are faults and attacks in embedded systems, as well as methods to
cope with their associated overhead. This chapter gives a motivation for the topic of this …