Functional and timing implications of transient faults in critical systems

A Kritikakou, P Nikolaou… - 2022 IEEE 28th …, 2022 - ieeexplore.ieee.org
Embedded systems in critical domains, such as auto-motive, aviation, space domains, are
often required to guarantee both functional and temporal correctness. Considering transient …

Fine-grained vulnerability analysis of resource constrained neural inference accelerators

P Corneliou, P Nikolaou, MK Michael… - … on Defect and Fault …, 2021 - ieeexplore.ieee.org
The use of Neural Network (NN) inference on edge devices necessitates the development of
customized Neural Inference Accelerators (NIA) in an attempt to meet performance and …

Improving GPU register file reliability with a comprehensive ISA extension

MM Goncalves, JER Condia, MS Reorda… - Microelectronics …, 2020 - Elsevier
This work proposes a comprehensive ISA extension to improve GPU reliability to transient
effects. Three additional instructions are proposed, implemented, and combined with …

Analysis of kernel redundancy for soft error mitigation on embedded GPUs

A Serrano-Cases, S Alcaide… - … on Nuclear Science, 2023 - ieeexplore.ieee.org
The use of state-of-the-art commercial processors such as graphical processing units
(GPUs) is becoming increasingly common in the New Space industry to ensure high …

Improving selective fault tolerance in gpu register files by relaxing application accuracy

MM Goncalves, IP Lamb, P Rech… - … on Nuclear Science, 2020 - ieeexplore.ieee.org
The high computing power of graphics processing units (GPUs) makes them attractive for
safety-critical applications, where reliability is a major concern. This article uses an …

A dynamic hardware redundancy mechanism for the in-field fault detection in cores of GPGPUs

JER Condia, P Narducci, MS Reorda… - … on Design and …, 2020 - ieeexplore.ieee.org
In the past, in most General-Purpose Graphic Processing Units (GPGPUs) application fields
(eg, multimedia and gaming), the reliability features were not so relevant. Nowadays …

Evaluating low-level software-based hardening techniques for configurable GPU architectures

MM Goncalves, JER Condia, MS Reorda… - The Journal of …, 2022 - Springer
The high processing power of GPUs makes them attractive for safety-critical applications,
where transient effects are a major concern, and resilience must be enforced without …

Multicore soft error reliability assessment and evaluation of Compiler optimization flag Effects on ARMv7 and ARMv8

U Jadhav, P Malathi, L Ost, R Garibotti… - 2023 IEEE 20th India …, 2023 - ieeexplore.ieee.org
With the fast technological advancement in submicron devices, today's communication
systems are more demanding for use of ARM based devices in safety critical systems …

[PDF][PDF] New Techniques for On-line Testing and Fault Mitigation in GPU

JER Condia - Doctoral Program in Computer and Control …, 2021 - core.ac.uk
Summary Currently, Graphics Processing Units (GPUs) are crucial devices able to boost the
execution of complex algorithms in the scientific and artificial intelligence domains …

Evaluating the Impact of Accuracy Relaxation in the Reliability of GPU Register Files

M Goncalves, I Lamb, RM Brum… - 2019 26th IEEE …, 2019 - ieeexplore.ieee.org
Thanks to the high computing power, GPUs have joined application domains where
reliability is a major concern. Faults on electronic components are mainly caused by …