Detection and correction of silent data corruption for large-scale high-performance computing

D Fiala, F Mueller, C Engelmann… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
Faults have become the norm rather than the exception for high-end computing clusters.
Exacerbating this situation, some of these faults remain undetected, manifesting themselves …

A survey of techniques for improving error-resilience of DRAM

S Mittal, MS Inukonda - Journal of Systems Architecture, 2018 - Elsevier
Aggressive process scaling and increasing demands of performance/cost efficiency have
exacerbated the incidences and impact of errors in DRAM systems. Due to this …

On representing edge structure for model matching

TF Cootes, CJ Taylor - Proceedings of the 2001 IEEE Computer …, 2001 - ieeexplore.ieee.org
We show how a novel, non-linear representation of edge structure can be used to improve
the performance of model matching algorithms and object verification/recognition tasks …

LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems

AN Udipi, N Muralimanohar… - ACM SIGARCH …, 2012 - dl.acm.org
Memory system reliability is a serious and growing concern in modern servers. Existing
chipkill-level memory protection mechanisms suffer from several drawbacks. They activate a …

XED: Exposing on-die error detection information for strong memory reliability

PJ Nair, V Sridharan, MK Qureshi - ACM SIGARCH Computer …, 2016 - dl.acm.org
Large-granularity memory failures continue to be a critical impediment to system reliability.
To make matters worse, as DRAM scales to smaller nodes, the frequency of unreliable bits …

Waste heat reutilization and integrated demand response for decentralized optimization of data centers

O Han, T Ding, C Mu, W Jia, Z Ma - Energy, 2023 - Elsevier
In recent years, the energy consumption of data centers (DCs) has increased rapidly. Since
the demand response (DR) capability of DCs is considerable, DCs play an important role in …

Exploring DRAM organizations for energy-efficient and resilient exascale memories

B Giridhar, M Cieslak, D Duggal, R Dreslinski… - Proceedings of the …, 2013 - dl.acm.org
The power target for exascale supercomputing is 20MW, with about 30% budgeted for the
memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the …

Citadel: Efficiently protecting stacked memory from tsv and large granularity failures

PJ Nair, DA Roberts, MK Qureshi - ACM Transactions on Architecture …, 2016 - dl.acm.org
Stacked memory modules are likely to be tightly integrated with the processor. It is vital that
these memory modules operate reliably, as memory failure can require the replacement of …

COP: To compress and protect main memory

DJ Palframan, NS Kim, MH Lipasti - ACM SIGARCH Computer …, 2015 - dl.acm.org
Protecting main memories from soft errors typically requires special dual-inline memory
modules (DIMMs) which incorporate at least one extra chip per rank to store error-correcting …

Quantitatively modeling application resilience with the data vulnerability factor

L Yu, D Li, S Mittal, JS Vetter - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
Recent strategies to improve the observable resilience of applications require the ability to
classify vulnerabilities of individual components (eg, Data structures, instructions) of an …