A survey on multithreading alternatives for soft error fault tolerance

I Oz, S Arslan - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce
higher soft error rates. This trend makes reliability a primary design constraint for computer …

Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading

S Arslan, O Unsal - The Journal of Supercomputing, 2021 - Springer
Redundant multithreading (RMT) is an effective reliability solution that provides thread-level
replication; however, it imposes additional overheads in terms of performance loss or energy …

Redthreads: An interface for application-level fault detection/correction through adaptive redundant multithreading

S Hukerikar, K Teranishi, PC Diniz… - International Journal of …, 2018 - Springer
In the presence of accelerated fault rates, which are projected to be the norm on future
exascale systems, it will become increasingly difficult for high-performance computing (HPC) …

FERNANDO: A software transient fault tolerance approach for embedded systems based on redundant multi-threading

H Wu, R Guo, Y Hu - IEEE Access, 2021 - ieeexplore.ieee.org
As semiconductor technology scales, modern microprocessors are more vulnerable to
transient faults. Software-level fault tolerance schemes are promising because they can …

Capturing XML constraints with relational schema

Y Liu, H Zhong, Y Wang - The Fourth International Conference …, 2004 - ieeexplore.ieee.org
The use of XML as the common format for representing, exchanging, and accessing data
poses many new challenges to XML storage systems. One way to this goal is to store XML …

Affinity-aware checkpoint restart

A Saini, A Rezaei, F Mueller, P Hargrove… - Proceedings of the 15th …, 2014 - dl.acm.org
Current checkpointing techniques employed to overcome faults for HPC applications result
in inferior application performance after restart from a checkpoint for a number of …

Resiliency in Numerical Algorithm Design for Extreme Scale Simulations (Dagstuhl Seminar 20101)

L Giraud, U Rüde, L Stals - 2020 - drops.dagstuhl.de
This work is based on the seminar titled" Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations" held March 1-6, 2020 at Schloss Dagstuhl, that was attended by …

TwinPCG: Dual thread redundancy with forward recovery for preconditioned conjugate gradient methods

K Dichev, DS Nikolopoulos - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
Even though iterative solvers like the Preconditioned Conjugate Gradient method (PCG)
have been studied for over fifty years, fault tolerance for such solvers has seen much …

FluidCheck: A redundant threading-based approach for reliable execution in manycore processors

R Kalayappan, SR Sarangi - ACM Transactions on Architecture and …, 2015 - dl.acm.org
Soft errors have become a serious cause of concern with reducing feature sizes. The ability
to accommodate complex, Simultaneous Multithreading (SMT) cores on a single chip …