Total order broadcast and multicast algorithms: Taxonomy and survey

X Défago, A Schiper, P Urbán - ACM Computing Surveys (CSUR), 2004 - dl.acm.org
Total order broadcast and multicast (also called atomic broadcast/multicast) present an
important problem in distributed systems, especially with respect to fault-tolerance. In short …

[PDF][PDF] Lightweight causal and atomic group multicast

K Birman, A Schiper, P Stephenson - ACM Transactions on Computer …, 1991 - dl.acm.org
The ISIS toolkit is a distributed programming environment based on virtually synchronous
process groups and group communication We present a new family of protocols in support …

Understanding fault-tolerant distributed systems

F Cristian - Communications of the ACM, 1991 - dl.acm.org
To achieve fault tolerance, a distributed system architecture incorporates redundant
processing components. Thus, before the issues which underlie fault-tolerance--or …

The timed asynchronous distributed system model

F Cristian, C Fetzer - IEEE Transactions on Parallel and …, 1999 - ieeexplore.ieee.org
We propose a formal definition for the timed asynchronous distributed system model. We
present extensive measurements of actual message and process scheduling delays and …

[图书][B] Introduction to reliable distributed programming

R Guerraoui, L Rodrigues - 2006 - books.google.com
In modern computing a program is usually distributed among several processes. The
fundamental challenge when developing reliable distributed programs is to support the …

Replica determinism in distributed real-time systems: A brief survey

S Poledna - Real-Time Systems, 1994 - Springer
Replication of entities is a convenient technique to achieve fault-tolerance. The problem of
replica determinism thereby is to assure, that replicated entities show consistent behavior in …

[PDF][PDF] Group communication in the amoeba distributed operating system

MF Kaashoek, AS Tanenbaum - Proc. 11th Int'l Conf. on Distr. Comp …, 1991 - research.vu.nl
Unlike many other operating systems, Amoeba is a distributed operating system that
provides group communication (ie, one-to-many communication). We will discuss design …

Coyote: A system for constructing fine-grain configurable communication services

NT Bhatti, MA Hiltunen, RD Schlichting… - ACM Transactions on …, 1998 - dl.acm.org
Communication-oriented abstractions such as atomic multicast, group RPC, and protocols
for location-independent mobile computing can simplify the development of complex …

Consul: A communication substrate for fault-tolerant distributed programs

S Mishra, LL Peterson… - Distributed Systems …, 1993 - iopscience.iop.org
Replicating important services on multiple processors in a distributed architecture is a
common technique for constructing dependable computing systems. The authors describe a …

Testing of fault-tolerant and real-time distributed systems via protocol fault injection

S Dawson, F Jahanian, T Mitton… - Proceedings of Annual …, 1996 - ieeexplore.ieee.org
As software for distributed systems becomes more complex, ensuring that a system meets its
prescribed specification is a growing challenge that confronts software developers. This is …