A survey of techniques for modeling and improving reliability of computing systems

S Mittal, JS Vetter - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences
and impact of faults in computing systems. This has madereliability'a first-order design …

DRAM errors in the wild: a large-scale field study

B Schroeder, E Pinheiro, WD Weber - ACM SIGMETRICS Performance …, 2009 - dl.acm.org
Errors in dynamic random access memory (DRAM) are a common form of hardware failure
in modern compute clusters. Failures are costly both in terms of hardware replacement costs …

SWIFT: Software implemented fault tolerance

GA Reis, J Chang, N Vachharajani… - … symposium on Code …, 2005 - ieeexplore.ieee.org
To improve performance and reduce power, processor designers employ advances that
shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates …

Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field

J Meza, Q Wu, S Kumar, O Mutlu - 2015 45th Annual IEEE/IFIP …, 2015 - ieeexplore.ieee.org
Computing systems use dynamic random-access memory (DRAM) as main memory. As
prior works have shown, failures in DRAM devices are an important source of errors in …

A study of DRAM failures in the field

V Sridharan, D Liberty - SC'12: Proceedings of the International …, 2012 - ieeexplore.ieee.org
Most modern computer systems use dynamic random access memory (DRAM) as a main
memory store. Recent publications have confirmed that DRAM errors are a common source …

[图书][B] Architecture design for soft errors

S Mukherjee - 2011 - books.google.com
Architecture Design for Soft Errors provides a comprehensive description of the architectural
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …

The soft error problem: An architectural perspective

SS Mukherjee, J Emer… - … Symposium on High …, 2005 - ieeexplore.ieee.org
Radiation-induced soft errors have emerged as a key challenge in computer system design.
If the industry is to continue to provide customers with the level of reliability they expect …

Techniques to reduce the soft error rate of a high-performance microprocessor

C Weaver, J Emer, SS Mukherjee… - ACM SIGARCH …, 2004 - dl.acm.org
Transient faults due to neutron and alpha particle strikes posea significant obstacle to
increasing processor transistor counts infuture technologies. Although fault rates of …

Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations

CW Slayman - IEEE Transactions on Device and Materials …, 2005 - ieeexplore.ieee.org
As the size of the SRAM cache and DRAM memory grows in servers and workstations,
cosmic-ray errors are becoming a major concern for systems designers and end users …

SRAM interleaving distance selection with a soft error failure model

S Baeg, SJ Wen, R Wong - IEEE Transactions on Nuclear …, 2009 - ieeexplore.ieee.org
The significance of multiple cell upsets (MCUs) is revealed by sharing the soft-error test
results in three major technologies, 90 nm, 65 nm, and 45 nm. The effectiveness of single-bit …