GPU devices for safety-critical systems: A survey
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …
languages and frameworks can deliver the computing performance required to facilitate the …
Understanding and mitigating hardware failures in deep learning training systems
Y He, M Hutton, S Chan, R De Gruijl… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep neural network (DNN) training workloads are increasingly susceptible to hardware
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …
Data masking techniques for NoSQL database security: A systematic review
A Cuzzocrea, H Shahriar - … conference on big data (Big Data), 2017 - ieeexplore.ieee.org
This paper first presents an in-depth study of potential security vulnerabilities in MongoDB
and Cassandra, two popular NoSQL databases. We provide examples of attacks. We then …
and Cassandra, two popular NoSQL databases. We provide examples of attacks. We then …
Making convolutions resilient via algorithm-based error detection techniques
Convolutional Neural Networks (CNNs) are being increasingly used in safety-critical and
high-performance computing systems. As such systems require high levels of resilience to …
high-performance computing systems. As such systems require high levels of resilience to …
[PDF][PDF] Optimizing Selective Protection for CNN Resilience.
As CNNs are being extensively employed in high performance and safety-critical
applications that demand high reliability, it is important to ensure that they are resilient to …
applications that demand high reliability, it is important to ensure that they are resilient to …
Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs
Neural networks (NNs) are increasingly employed in safety-critical domains and in
environments prone to unreliability (eg, soft errors), such as on spacecraft. Therefore, it is …
environments prone to unreliability (eg, soft errors), such as on spacecraft. Therefore, it is …
Featherweight soft error resilience for GPUs
This paper presents Flame, a hardware/software co-designed resilience scheme for
protecting GPUs against soft errors. For low-cost yet high-performance resilience, Flame …
protecting GPUs against soft errors. For low-cost yet high-performance resilience, Flame …
Lltfi: Framework agnostic fault injection for machine learning applications (tools and artifact track)
UK Agarwal, A Chan… - 2022 IEEE 33rd …, 2022 - ieeexplore.ieee.org
As machine learning (ML) has become more preva-lent across many critical domains, so
has the need to understand ML applications' resilience. While prior work like TensorFI [1] …
has the need to understand ML applications' resilience. While prior work like TensorFI [1] …
Exploiting temporal data diversity for detecting safety-critical faults in AV compute systems
Silent data corruption caused by random hardware faults in autonomous vehicle (AV)
computational elements is a significant threat to vehicle safety. Previous research has …
computational elements is a significant threat to vehicle safety. Previous research has …
Software-only based diverse redundancy for asil-d automotive applications on embedded hpc platforms
High-Performance Computing (HPC) platforms become a must in automotive systems to
enable autonomous driving. However, automotive platforms must avoid Common Cause …
enable autonomous driving. However, automotive platforms must avoid Common Cause …