rMPI: increasing fault resiliency in a message-passing environment.

… As an alternative, we present a method to increase resilience … This approach effectively
increases the time between faults … of some faults, it is now possible to increase the checkpoint …

Increasing fault resiliency in a message-passing environment.

… We use signals to alert rMPI of failed nodes and can thus simulate the failure of nodes for
testing purposes. However, since all nodes still are part of a complete MPI application and due …

Toward a scalable, transactional, fault-tolerant message passing interface for petascale and exascale machines

A Hassani - 2016 - search.proquest.com
… and scalable fault-tolerant environment requires the … on resilience in large-scale computing
with the focus of fault-… rMPI [9] is a fault-tolerant implementation of MPI through process …

Legio: fault resiliency for embarrassingly parallel MPI applications

R Rocco, D Gadioli, G Palermo - The Journal of Supercomputing, 2022 - Springer
… File and remote memory access operations are not supported by ULFM and are likely to
fail in a faulty environment (rather than raising an error, they throw a segmentation …

[PDF][PDF] Redundant execution of HPC applications with MR-MPI

C Engelmann, S Böhm - Proceedings of the 10th IASTED …, 2011 - christian-engelmann.info
problems, rMPI does not support certain complex MPI communicator operations, such as MPI
Comm split(). rMPIIncreasing fault resiliency in a messagepassing environment. Technical …

Transparent redundant computing with MPI

R Brightwell, K Ferreira, R Riesen - … Advances in the Message Passing …, 2010 - Springer
… the overhead of providing resiliency through redundancy. … ideas that could improve the
performance of rMPI and limit … Increasing fault resiliency in a message-passing environment. …

Asynchronous Teams and Tasks in a Message Passing Environment

B Hazelwood - 2019 - etheses.dur.ac.uk
… must become tolerant to such faults. With large numbers of … contributes resiliency features
to the “Exascale Environment” … were also released following rMPI in 2011. After reiterating …

[HTML][HTML] Lazy Fault Recovery for Redundant MPI

E Saliba - 2019 - search.proquest.com
… languages for DS is Message Passing Interface (MPI). … rely on any specific MPI library
whereas rMPI is specific to MPICH. … , then explain the need for resilience in this library, and finally …

Fault-aware group-collective communication creation and repair in mpi

R Rocco, G Palermo - European Conference on Parallel Processing, 2023 - Springer
… We integrate our solution into an existing fault-resiliency … of fault management techniques in
the Message Passing Interface (… to fault presence consists of letting the MPI call raise an error

A robust and efficient message passing library for volunteer computing environments

R Anand, T LeBlanc, E Gabriel, J Subhlok - Journal of Grid Computing, 2011 - Springer
… MPI/FT [9], P2P-MPI [19] and rMPI [16] are MPI libraries that provide fault-tolerance through
failure or major performance degradations. Furthermore, we demonstrated that increasing