Halfmoon: Log-optimal fault-tolerant stateful serverless computing
Serverless computing separates function execution from state management. Simple retry-
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
Apus: Fast and scalable paxos on rdma
State machine replication (SMR) uses Paxos to enforce the same inputs for a program (eg,
Redis) replicated on a number of hosts, tolerating various types of failures. Unfortunately …
Redis) replicated on a number of hosts, tolerating various types of failures. Unfortunately …
HovercRaft: Achieving scalability and fault-tolerance for microsecond-scale datacenter services
Cloud platform services must simultaneously be scalable, meet low tail latency service-level
objectives, and be resilient to a combination of software, hardware, and network failures …
objectives, and be resilient to a combination of software, hardware, and network failures …
State machine replication in containers managed by Kubernetes
Computer virtualization brought fast resource provisioning to data centers and the
deployment of pay-per-use cost models. The system virtualization provided by containers …
deployment of pay-per-use cost models. The system virtualization provided by containers …
Derecho: Fast state machine replication for cloud services
Cloud computing services often replicate data and may require ways to coordinate
distributed actions. Here we present Derecho, a library for such tasks. The API provides …
distributed actions. Here we present Derecho, a library for such tasks. The API provides …
Recent results on fault-tolerant consensus in message-passing networks
L Tseng - … Colloquium, SIROCCO 2016, Helsinki, Finland, July …, 2016 - Springer
Fault-tolerant consensus has been studied extensively in the literature, because it is one of
the important distributed primitives and has wide applications in practice. This paper surveys …
the important distributed primitives and has wide applications in practice. This paper surveys …
Optimal prediction of synchronization-preserving races
Concurrent programs are notoriously hard to write correctly, as scheduling nondeterminism
introduces subtle errors that are both hard to detect and to reproduce. The most common …
introduces subtle errors that are both hard to detect and to reproduce. The most common …
HAFT: Hardware-assisted fault tolerance
Transient hardware faults during the execution of a program can cause data corruptions. We
present HAFT, a fault tolerance technique using hardware extensions of commodity CPUs to …
present HAFT, a fault tolerance technique using hardware extensions of commodity CPUs to …
{IONIA}:{High-Performance} Replication for Modern Disk-based {KV} Stores
We introduce IONIA, a novel replication protocol tailored for modern SSD-based write-
optimized key-value (WO-KV) stores. Unlike existing replication approaches, IONIA carefully …
optimized key-value (WO-KV) stores. Unlike existing replication approaches, IONIA carefully …
Disaggregating stateful network functions
D Bansal, G DeGrace, R Tewari, M Zygmunt… - … USENIX Symposium on …, 2023 - usenix.org
For security, isolation, metering and other purposes, public clouds today implement complex
network functions at every server. Today's implementations, in software or on FPGAs and …
network functions at every server. Today's implementations, in software or on FPGAs and …