A survey on modeling and improving reliability of DNN algorithms and accelerators

S Mittal - Journal of Systems Architecture, 2020 - Elsevier
As DNNs become increasingly common in mission-critical applications, ensuring their
reliable operation has become crucial. Conventional resilience techniques fail to account for …

A systematic literature review on hardware reliability assessment methods for deep neural networks

MH Ahmadilivani, M Taheri, J Raik… - ACM Computing …, 2024 - dl.acm.org
Artificial Intelligence (AI) and, in particular, Machine Learning (ML), have emerged to be
utilized in various applications due to their capability to learn how to solve complex …

[图书][B] Architecture design for soft errors

S Mukherjee - 2011 - books.google.com
Architecture Design for Soft Errors provides a comprehensive description of the architectural
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …

SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation

SKS Hari, T Tsai, M Stephenson… - … Analysis of Systems …, 2017 - ieeexplore.ieee.org
As GPUs become more pervasive in both scalable high-performance computing systems
and safety-critical embedded systems, evaluating and analyzing their resilience to soft errors …

Evaluation of hybrid memory technologies using SOT-MRAM for on-chip cache hierarchy

F Oboril, R Bishnoi, M Ebrahimi… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Magnetic Random Access Memory (MRAM) is a very promising emerging memory
technology because of its various advantages such as nonvolatility, high density and …

Reliable on-chip systems in the nano-era: Lessons learnt and future trends

J Henkel, L Bauer, N Dutt, P Gupta, S Nassif… - Proceedings of the 50th …, 2013 - dl.acm.org
Reliability concerns due to technology scaling have been a major focus of researchers and
designers for several technology nodes. Therefore, many new techniques for enhancing and …

Eliminating microarchitectural dependency from architectural vulnerability

V Sridharan, DR Kaeli - 2009 IEEE 15th International …, 2009 - ieeexplore.ieee.org
The architectural vulnerability factor (AVF) of a hardware structure is the probability that a
fault in the structure will affect the output of a program. AVF captures both microarchitectural …

CudaDMA: optimizing GPU memory bandwidth via warp specialization

M Bauer, H Cook, B Khailany - … of 2011 international conference for high …, 2011 - dl.acm.org
As the computational power of GPUs continues to scale with Moore's Law, an increasing
number of applications are becoming limited by memory bandwidth. We propose an …

Avgi: Microarchitecture-driven, fast and accurate vulnerability assessment

G Papadimitriou, D Gizopoulos - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
We propose AVGI, a new Statistical Fault Injection (SFI)-based methodology, which delivers
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …

Examining ACE analysis reliability estimates using fault-injection

NJ Wang, A Mahesri, SJ Patel - Proceedings of the 34th annual …, 2007 - dl.acm.org
ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE
analysis couples data from abstract performance models with low level design details to …