A survey of techniques for modeling and improving reliability of computing systems
Recent trends of aggressive technology scaling have greatly exacerbated the occurrences
and impact of faults in computing systems. This has madereliability'a first-order design …
and impact of faults in computing systems. This has madereliability'a first-order design …
DRAM errors in the wild: a large-scale field study
B Schroeder, E Pinheiro, WD Weber - ACM SIGMETRICS Performance …, 2009 - dl.acm.org
Errors in dynamic random access memory (DRAM) are a common form of hardware failure
in modern compute clusters. Failures are costly both in terms of hardware replacement costs …
in modern compute clusters. Failures are costly both in terms of hardware replacement costs …
SWIFT: Software implemented fault tolerance
To improve performance and reduce power, processor designers employ advances that
shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates …
shrink feature sizes, lower voltage levels, reduce noise margins, and increase clock rates …
Revisiting memory errors in large-scale production data centers: Analysis and modeling of new trends from the field
Computing systems use dynamic random-access memory (DRAM) as main memory. As
prior works have shown, failures in DRAM devices are an important source of errors in …
prior works have shown, failures in DRAM devices are an important source of errors in …
A study of DRAM failures in the field
V Sridharan, D Liberty - SC'12: Proceedings of the International …, 2012 - ieeexplore.ieee.org
Most modern computer systems use dynamic random access memory (DRAM) as a main
memory store. Recent publications have confirmed that DRAM errors are a common source …
memory store. Recent publications have confirmed that DRAM errors are a common source …
[图书][B] Architecture design for soft errors
S Mukherjee - 2011 - books.google.com
Architecture Design for Soft Errors provides a comprehensive description of the architectural
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …
The soft error problem: An architectural perspective
SS Mukherjee, J Emer… - … Symposium on High …, 2005 - ieeexplore.ieee.org
Radiation-induced soft errors have emerged as a key challenge in computer system design.
If the industry is to continue to provide customers with the level of reliability they expect …
If the industry is to continue to provide customers with the level of reliability they expect …
Techniques to reduce the soft error rate of a high-performance microprocessor
C Weaver, J Emer, SS Mukherjee… - ACM SIGARCH …, 2004 - dl.acm.org
Transient faults due to neutron and alpha particle strikes posea significant obstacle to
increasing processor transistor counts infuture technologies. Although fault rates of …
increasing processor transistor counts infuture technologies. Although fault rates of …
Cache and memory error detection, correction, and reduction techniques for terrestrial servers and workstations
CW Slayman - IEEE Transactions on Device and Materials …, 2005 - ieeexplore.ieee.org
As the size of the SRAM cache and DRAM memory grows in servers and workstations,
cosmic-ray errors are becoming a major concern for systems designers and end users …
cosmic-ray errors are becoming a major concern for systems designers and end users …
SRAM interleaving distance selection with a soft error failure model
S Baeg, SJ Wen, R Wong - IEEE Transactions on Nuclear …, 2009 - ieeexplore.ieee.org
The significance of multiple cell upsets (MCUs) is revealed by sharing the soft-error test
results in three major technologies, 90 nm, 65 nm, and 45 nm. The effectiveness of single-bit …
results in three major technologies, 90 nm, 65 nm, and 45 nm. The effectiveness of single-bit …