Detection and correction of silent data corruption for large-scale high-performance computing
Faults have become the norm rather than the exception for high-end computing clusters.
Exacerbating this situation, some of these faults remain undetected, manifesting themselves …
Exacerbating this situation, some of these faults remain undetected, manifesting themselves …
A survey of techniques for improving error-resilience of DRAM
S Mittal, MS Inukonda - Journal of Systems Architecture, 2018 - Elsevier
Aggressive process scaling and increasing demands of performance/cost efficiency have
exacerbated the incidences and impact of errors in DRAM systems. Due to this …
exacerbated the incidences and impact of errors in DRAM systems. Due to this …
On representing edge structure for model matching
We show how a novel, non-linear representation of edge structure can be used to improve
the performance of model matching algorithms and object verification/recognition tasks …
the performance of model matching algorithms and object verification/recognition tasks …
LOT-ECC: Localized and tiered reliability mechanisms for commodity memory systems
AN Udipi, N Muralimanohar… - ACM SIGARCH …, 2012 - dl.acm.org
Memory system reliability is a serious and growing concern in modern servers. Existing
chipkill-level memory protection mechanisms suffer from several drawbacks. They activate a …
chipkill-level memory protection mechanisms suffer from several drawbacks. They activate a …
XED: Exposing on-die error detection information for strong memory reliability
Large-granularity memory failures continue to be a critical impediment to system reliability.
To make matters worse, as DRAM scales to smaller nodes, the frequency of unreliable bits …
To make matters worse, as DRAM scales to smaller nodes, the frequency of unreliable bits …
Waste heat reutilization and integrated demand response for decentralized optimization of data centers
In recent years, the energy consumption of data centers (DCs) has increased rapidly. Since
the demand response (DR) capability of DCs is considerable, DCs play an important role in …
the demand response (DR) capability of DCs is considerable, DCs play an important role in …
Exploring DRAM organizations for energy-efficient and resilient exascale memories
The power target for exascale supercomputing is 20MW, with about 30% budgeted for the
memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the …
memory subsystem. Commodity DRAMs will not satisfy this requirement. Additionally, the …
Citadel: Efficiently protecting stacked memory from tsv and large granularity failures
PJ Nair, DA Roberts, MK Qureshi - ACM Transactions on Architecture …, 2016 - dl.acm.org
Stacked memory modules are likely to be tightly integrated with the processor. It is vital that
these memory modules operate reliably, as memory failure can require the replacement of …
these memory modules operate reliably, as memory failure can require the replacement of …
COP: To compress and protect main memory
DJ Palframan, NS Kim, MH Lipasti - ACM SIGARCH Computer …, 2015 - dl.acm.org
Protecting main memories from soft errors typically requires special dual-inline memory
modules (DIMMs) which incorporate at least one extra chip per rank to store error-correcting …
modules (DIMMs) which incorporate at least one extra chip per rank to store error-correcting …
Quantitatively modeling application resilience with the data vulnerability factor
Recent strategies to improve the observable resilience of applications require the ability to
classify vulnerabilities of individual components (eg, Data structures, instructions) of an …
classify vulnerabilities of individual components (eg, Data structures, instructions) of an …