GPU devices for safety-critical systems: A survey

J Perez-Cerrolaza, J Abella, L Kosmidis… - ACM Computing …, 2022 - dl.acm.org
Graphics Processing Unit (GPU) devices and their associated software programming
languages and frameworks can deliver the computing performance required to facilitate the …

Software-controlled pipeline parity in gpu architectures for error detection

G Braga, MM Gonçalves, JR Azambuja - Microelectronics Reliability, 2023 - Elsevier
Abstract Graphics Processing Units have been increasingly used to provide high-
performance processing capabilities. Still, in some areas, such as avionics, it is subject to …

Evaluating an xor-based hybrid fault tolerance technique to detect faults in gpu pipelines

GA Braga, MM Gonçalves… - 2023 IEEE Computer …, 2023 - ieeexplore.ieee.org
Graphics Processing Units are consistently reaching new applications due to their massive
parallel execution architectures. However, some safety-critical areas, such as avionics …

Dyre: a dynamic reconfigurable solution to increase gpgpu's reliability

JER Condia, P Narducci, M Sonza Reorda… - The Journal of …, 2021 - Springer
General-purpose graphics processing units (GPGPUs) are extensively used in high-
performance computing. However, it is well known that these devices' reliability may be …

GPU Reliability Assessment: Insights Across the Abstraction Layers

L Yang, G Papadimitriou, D Sartzetakis… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely de-ployed and utilized across various
computing domains including cloud and high-performance computing. Considering its …

Evaluating low-level software-based hardening techniques for configurable GPU architectures

MM Goncalves, JER Condia, MS Reorda… - The Journal of …, 2022 - Springer
The high processing power of GPUs makes them attractive for safety-critical applications,
where transient effects are a major concern, and resilience must be enforced without …

Evaluation of Fault Mitigation Techniques Based on Approximate Computing Under Radiation

A Martínez-Álvarez… - … on Nuclear Science, 2024 - ieeexplore.ieee.org
A software technique based on approximate computing and redundancy is presented to
mitigate radiation-induced soft errors in COTS microprocessors. Approximate Computing …

Avoiding Soft Error-Induced Illegal Memory Accesses in GPU with Inter-Thread Communication

R Iwamoto, M Hashimoto - … Symposium on On-Line Testing and …, 2023 - ieeexplore.ieee.org
A soft error caused by terrestrial neutrons poses a threat to the reliability of safety-critical
systems, such as self-driving applications. These applications, often comprised of neural …

Improving GPU Reliability with Software-Managed Pipeline Parity for Error Detection and Correction

GA Braga, L Gobatto, MM Gonçalves… - 2024 IEEE 15th Latin …, 2024 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are gaining increasing prominence for their high-
performance computing capabilities. However, specific critical sectors like avionics expose …

An Investigation into Fault Detection and Correction in GPU Pipelines with a Hybrid XOR Approach

GA Braga, L Gobatto, MM Gonçalves… - 2024 IEEE 15th Latin …, 2024 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are continually finding new applications due to their
broad parallel execution architectures. However, in safety-critical areas such as avionics …