A survey of rollback-recovery protocols in message-passing systems

EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org
This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …

A survey of recoverable distributed shared virtual memory systems

C Morin, I Puaut - IEEE Transactions on parallel and …, 1997 - ieeexplore.ieee.org
Distributed Shared Virtual Memory (DSVM) systems provide a shared memory abstraction
on distributed memory architectures. Such systems ease parallel application programming …

How to recover efficiently and asynchronously when optimism fails

OP Damani, VK Garg - Proceedings of 16th International …, 1996 - ieeexplore.ieee.org
We propose a new algorithm for recovering asynchronously from failures in a distributed
computation. Our algorithm is based on two novel concepts-a fault-tolerant vector clock to …

Fault-tolerant matrix operations for networks of workstations using diskless checkpointing

JS Plank, Y Kim, JJ Dongarra - Journal of Parallel and Distributed …, 1997 - Elsevier
Networks of workstations (NOWs) offer a cost-effective platform for high-performance, long-
running parallel computations. However, these computations must be able to tolerate the …

[HTML][HTML] Lightweight logging for lazy release consistent distributed shared memory

M Costa, P Guedes, M Sequeira, N Neves, M Castro - OSDI, 1996 - usenix.org
Papers - OSDI '96 Check out the new USENIX Web site. Home About USENIX Events
Membership Publications Students USENIX 2nd Symposium on OS Design and Implementation …

Supporting nondeterministic execution in fault-tolerant systems

JH Slye, EN Elnozahy - … of Annual Symposium on Fault Tolerant …, 1996 - ieeexplore.ieee.org
We present a technique to track nondeterminism resulting from asynchronous events and
multithreading in log-based rollback-recovery protocols. This technique relies on using a …

Scalable fault-tolerant distributed shared memory

F Sultan, T Nguyen, L Iftode - SC'00: Proceedings of the 2000 …, 2000 - ieeexplore.ieee.org
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol
can be efficiently extended to tolerate single-node failures. In particular, we extend a home …

A comprehensive bibliography of distributed shared memory

MR Eskicioglu - ACM SIGOPS Operating Systems Review, 1996 - dl.acm.org
A Comprehensive Bibliography of Distributed Shared Memory Page 1 A Comprehensive
Bibliography of Distributed Shared Memory M. Rasit Eskicioglu Department of Computer …

Specification of real-time systems in real-time temporal interval logic

KT Narayana, AA Aaby - Proceedings. Real-Time Systems …, 1988 - computer.org
A real-time variant of temporal interval logic is proposed for the specification and reasoning
of real-time systems. In the framework of the logic, it is possible to specify qualitative and …

Backtrackable state with linear affine implication and assumption grammars

P Tarau, V Dahl, A Fall - … , and Security: Second Asian Computing Science …, 1996 - Springer
A general framework of handling state information for logic programming languages on top
of backtrackable assumptions (linear affine and intuitionistic implications ranging over the …