Batch: Machine learning inference serving on serverless platforms with adaptive batching

A Ali, R Pinciroli, F Yan, E Smirni - … International Conference for …, 2020 - ieeexplore.ieee.org
Serverless computing is a new pay-per-use cloud service paradigm that automates resource
scaling for stateless functions and can potentially facilitate bursty machine learning serving …

A survey and classification of software-defined storage systems

R Macedo, J Paulo, J Pereira, A Bessani - ACM Computing Surveys …, 2020 - dl.acm.org
The exponential growth of digital information is imposing increasing scale and efficiency
demands on modern storage infrastructures. As infrastructure complexity increases, so does …

Reflex: Remote flash≈ local flash

A Klimovic, H Litz, C Kozyrakis - ACM SIGARCH Computer Architecture …, 2017 - dl.acm.org
Remote access to NVMe Flash enables flexible scaling and high utilization of Flash capacity
and IOPS within a datacenter. However, existing systems for remote Flash access either …

C3: Cutting tail latency in cloud data stores via adaptive replica selection

L Suresh, M Canini, S Schmid… - 12th USENIX Symposium …, 2015 - usenix.org
Achieving predictable performance is critical for many distributed applications, yet difficult to
achieve due to many factors that skew the tail of the latency distribution even in well …

Open problems in queueing theory inspired by datacenter computing

M Harchol-Balter - Queueing Systems, 2021 - Springer
Datacenter operations today provide a plethora of new queueing and scheduling problems.
The notion of a “job” has become more general and multi-dimensional. The ways in which …

{RobinHood}: Tail Latency Aware Caching--Dynamic Reallocation from {Cache-Rich} to {Cache-Poor}

DS Berger, B Berg, T Zhu, S Sen… - 13th USENIX Symposium …, 2018 - usenix.org
Tail latency is of great importance in user-facing web services. However, maintaining low tail
latency is challenging, because a single request to a web application server results in …

Aequitas: Admission control for performance-critical rpcs in datacenters

Y Zhang, G Kumar, N Dukkipati, X Wu, P Jha… - Proceedings of the …, 2022 - dl.acm.org
With the increasing popularity of disaggregated storage and microservice architectures, high
fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in …

Lyra: Elastic scheduling for deep learning clusters

J Li, H Xu, Y Zhu, Z Liu, C Guo, C Wang - Proceedings of the Eighteenth …, 2023 - dl.acm.org
Organizations often build separate training and inference clusters for deep learning, and use
separate schedulers to manage them. This leads to problems for both: inference clusters …

Decibel: Isolation and sharing in disaggregated {Rack-Scale} storage

M Nanavati, J Wires, A Warfield - 14th USENIX Symposium on …, 2017 - usenix.org
The performance characteristics of modern non-volatile storage devices have led to storage
becoming a shared, rack-scale resource. System designers have an imperative to rethink …

Managing tail latency in datacenter-scale file systems under production constraints

PA Misra, MF Borge, Í Goiri, AR Lebeck… - Proceedings of the …, 2019 - dl.acm.org
Distributed file systems often exhibit high tail latencies, especially in large-scale datacenters
and in the presence of competing (and possibly higher priority) workloads. This paper …