Batch: Machine learning inference serving on serverless platforms with adaptive batching
Serverless computing is a new pay-per-use cloud service paradigm that automates resource
scaling for stateless functions and can potentially facilitate bursty machine learning serving …
scaling for stateless functions and can potentially facilitate bursty machine learning serving …
A survey and classification of software-defined storage systems
The exponential growth of digital information is imposing increasing scale and efficiency
demands on modern storage infrastructures. As infrastructure complexity increases, so does …
demands on modern storage infrastructures. As infrastructure complexity increases, so does …
Reflex: Remote flash≈ local flash
Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity
and IOPS within a datacenter. However, existing systems for remote Flash access either …
and IOPS within a datacenter. However, existing systems for remote Flash access either …
C3: Cutting tail latency in cloud data stores via adaptive replica selection
Achieving predictable performance is critical for many distributed applications, yet difficult to
achieve due to many factors that skew the tail of the latency distribution even in well …
achieve due to many factors that skew the tail of the latency distribution even in well …
Open problems in queueing theory inspired by datacenter computing
M Harchol-Balter - Queueing Systems, 2021 - Springer
Datacenter operations today provide a plethora of new queueing and scheduling problems.
The notion of a “job” has become more general and multi-dimensional. The ways in which …
The notion of a “job” has become more general and multi-dimensional. The ways in which …
{RobinHood}: Tail Latency Aware Caching--Dynamic Reallocation from {Cache-Rich} to {Cache-Poor}
Tail latency is of great importance in user-facing web services. However, maintaining low tail
latency is challenging, because a single request to a web application server results in …
latency is challenging, because a single request to a web application server results in …
Aequitas: Admission control for performance-critical rpcs in datacenters
With the increasing popularity of disaggregated storage and microservice architectures, high
fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in …
fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in …
Lyra: Elastic scheduling for deep learning clusters
Organizations often build separate training and inference clusters for deep learning, and use
separate schedulers to manage them. This leads to problems for both: inference clusters …
separate schedulers to manage them. This leads to problems for both: inference clusters …
Decibel: Isolation and sharing in disaggregated {Rack-Scale} storage
M Nanavati, J Wires, A Warfield - 14th USENIX Symposium on …, 2017 - usenix.org
The performance characteristics of modern non-volatile storage devices have led to storage
becoming a shared, rack-scale resource. System designers have an imperative to rethink …
becoming a shared, rack-scale resource. System designers have an imperative to rethink …
Managing tail latency in datacenter-scale file systems under production constraints
Distributed file systems often exhibit high tail latencies, especially in large-scale datacenters
and in the presence of competing (and possibly higher priority) workloads. This paper …
and in the presence of competing (and possibly higher priority) workloads. This paper …