A survey of rollback-recovery protocols in message-passing systems
EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org
This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …
constructs. In the first part of the survey we classify rollback-recovery protocols into …
[PDF][PDF] An overview of checkpointing in uniprocessor and distributed systems, focusing on implementation and performance
JS Plank - 1997 - library.eecs.utk.edu
Checkpointing is the act of saving the state of a running program so that it may be
reconstructed later in time. It is an important basic functionality in computing systems that …
reconstructed later in time. It is an important basic functionality in computing systems that …
Software rejuvenation: Analysis, module and applications
Y Huang, C Kintala, N Kolettis… - Twenty-fifth international …, 1995 - ieeexplore.ieee.org
Software rejuvenation is the concept of gracefully terminating an application and
immediately restarting it at a clean internal state. In a client-server type of application where …
immediately restarting it at a clean internal state. In a client-server type of application where …
Rx: treating bugs as allergies---a safe method to survive software failures
Many applications demand availability. Unfortunately, software failures greatly reduce
system availability. Prior work on surviving software failures suffers from one or more of the …
system availability. Prior work on surviving software failures suffers from one or more of the …
Experiments on local positioning with Bluetooth
A Kotanen, M Hannikainen… - Proceedings ITCC …, 2003 - ieeexplore.ieee.org
This paper presents the design and implementation of the Bluetooth local positioning
application. Positioning is based on received power levels, which are converted to distance …
application. Positioning is based on received power levels, which are converted to distance …
Analysis of preventive maintenance in transactions based software systems
Preventive maintenance of operational software systems, a novel technique for software
fault tolerance, is used specifically to counteract the phenomenon of software" aging" …
fault tolerance, is used specifically to counteract the phenomenon of software" aging" …
[PDF][PDF] Software implemented fault tolerance: Technologies and experience
Y Huang, C Kintala - FTCS, 1993 - researchgate.net
By software implemented fault tolerance, we mean a set of software facilities to detect 'and
recover from faults that are are not handled by the underlying hardware or operating system …
recover from faults that are are not handled by the underlying hardware or operating system …
Consistent global checkpoints that contain a given set of local checkpoints
YM Wang - IEEE Transactions on Computers, 1997 - ieeexplore.ieee.org
In this paper, we consider the problem of constructing consistent global checkpoints that
contain a given set of checkpoints. We address three important issues related to this …
contain a given set of checkpoints. We address three important issues related to this …
Software dependability in the Tandem GUARDIAN system
I Lee, RK Iyer - IEEE Transactions on Software Engineering, 1995 - ieeexplore.ieee.org
Based on extensive field failure data for Tandem's GUARDIAN operating system, the paper
discusses evaluation of the dependability of operational software. Software faults …
discusses evaluation of the dependability of operational software. Software faults …
An empirical study of service mesh traffic management policies for microservices
A microservice architecture features hundreds or even thousands of small loosely coupled
services with multiple instances. Because microservice performance depends on many …
services with multiple instances. Because microservice performance depends on many …