Coordinated batching and DVFS for DNN inference on GPU accelerators
SM Nabavinejad, S Reda… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Employing hardware accelerators to improve the performance and energy-efficiency of DNN
applications is on the rise. One challenge of using hardware accelerators, including the GPU …
applications is on the rise. One challenge of using hardware accelerators, including the GPU …
GRAF: A graph neural network based proactive resource allocation framework for SLO-oriented microservices
Microservice is an architectural style that has been widely adopted in various latency-
sensitive applications. Similar to the monolith, autoscaling has attracted the attention of …
sensitive applications. Similar to the monolith, autoscaling has attracted the attention of …
Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows
Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …
No Provisioned Concurrency: Fast {RDMA-codesigned} Remote Fork for Serverless Computing
Serverless platforms essentially face a tradeoff between container startup time and
provisioned concurrency (ie, cached instances), which is further exaggerated by the frequent …
provisioned concurrency (ie, cached instances), which is further exaggerated by the frequent …
Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu
Current serverless platforms struggle to optimize resource utilization due to their dynamic
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …
SIMPPO: A scalable and incremental online learning framework for serverless resource management
Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet
it is not server-" less" and comes at the cost of more complex infrastructure management (eg …
it is not server-" less" and comes at the cost of more complex infrastructure management (eg …
Parslo: A gradient descent-based approach for near-optimal partial SLO allotment in microservices
A Mirhosseini, S Elnikety, TF Wenisch - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org
Modern cloud services are implemented as graphs of loosely-coupled microservices to
improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define …
improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define …
Lifting the fog of uncertainties: Dynamic resource orchestration for the containerized cloud
The advances in virtualization technologies have sparked a growing transition from virtual
machine (VM)-based to container-based infrastructure for cloud computing. From the …
machine (VM)-based to container-based infrastructure for cloud computing. From the …
AutoMan: Resource-efficient provisioning with tail latency guarantees for microservices
B Cai, B Wang, M Yang, Q Guo - Future Generation Computer Systems, 2023 - Elsevier
Modern user-facing services are progressively evolving from large monolithic applications to
complex graphs of loosely-coupled microservices. While microservice architecture greatly …
complex graphs of loosely-coupled microservices. While microservice architecture greatly …
TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices
Microservice has become a de facto standard for building large-scale cloud applications.
Overload control is essential in preventing microservice failures and maintaining system …
Overload control is essential in preventing microservice failures and maintaining system …