Datacenter {RPCs} can be general and fast
It is commonly believed that datacenter networking software must sacrifice generality to
attain high performance. The popularity of specialized distributed systems designed …
attain high performance. The popularity of specialized distributed systems designed …
Offloading distributed applications onto smartnics using ipipe
Emerging Multicore SoC SmartNICs, enclosing rich computing resources (eg, a multicore
processor, onboard DRAM, accelerators, programmable DMA engines), hold the potential to …
processor, onboard DRAM, accelerators, programmable DMA engines), hold the potential to …
RDMA over commodity ethernet at scale
Over the past one and half years, we have been using RDMA over commodity Ethernet
(RoCEv2) to support some of Microsoft's highly-reliable, latency-sensitive services. This …
(RoCEv2) to support some of Microsoft's highly-reliable, latency-sensitive services. This …
The demikernel datapath os architecture for microsecond-scale datacenter systems
Datacenter systems and I/O devices now run at single-digit microsecond latencies, requiring
ns-scale operating systems. Traditional kernel-based operating systems impose an …
ns-scale operating systems. Traditional kernel-based operating systems impose an …
{FaSST}: Fast, Scalable and Simple Distributed Transactions with {Two-Sided}({{{{{RDMA}}}}}) Datagram {RPCs}
FaSST is an RDMA-based system that provides distributed in-memory transactions with
serializability and durability. Existing RDMA-based transaction processing systems use one …
serializability and durability. Existing RDMA-based transaction processing systems use one …
{NetChain}:{Scale-Free}{Sub-RTT} coordination
Coordination services are a fundamental building block of modern cloud systems, providing
critical functionalities like configuration management and distributed locking. The major …
critical functionalities like configuration management and distributed locking. The major …
Design guidelines for high performance {RDMA} systems
Modern RDMA hardware offers the potential for exceptional performance, but design
choices including which RDMA operations to use and how to use them significantly affect …
choices including which RDMA operations to use and how to use them significantly affect …
Zygos: Achieving low tail latency for microsecond-scale networked tasks
This paper focuses on the efficient scheduling on multicore systems of very fine-grain
networked tasks, which are the typical building block of online data-intensive applications …
networked tasks, which are the typical building block of online data-intensive applications …
Endurable transient inconsistency in {Byte-Addressable} persistent {B+-Tree}
With the emergence of byte-addressable persistent memory (PM), a cache line, instead of a
page, is expected to be the unit of data transfer between volatile and nonvolatile devices, but …
page, is expected to be the unit of data transfer between volatile and nonvolatile devices, but …
Sherman: A write-optimized distributed b+ tree index on disaggregated memory
Memory disaggregation architecture physically separates CPU and memory into
independent components, which are connected via high-speed RDMA networks, greatly …
independent components, which are connected via high-speed RDMA networks, greatly …