Coordinated batching and DVFS for DNN inference on GPU accelerators

SM Nabavinejad, S Reda… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Employing hardware accelerators to improve the performance and energy-efficiency of DNN
applications is on the rise. One challenge of using hardware accelerators, including the GPU …

GRAF: A graph neural network based proactive resource allocation framework for SLO-oriented microservices

J Park, B Choi, C Lee, D Han - … of the 17th International Conference on …, 2021 - dl.acm.org
Microservice is an architectural style that has been widely adopted in various latency-
sensitive applications. Similar to the monolith, autoscaling has attracted the attention of …

Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows

Z Zhou, Y Zhang, C Delimitrou - Proceedings of the 28th ACM …, 2022 - dl.acm.org
Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …

No Provisioned Concurrency: Fast {RDMA-codesigned} Remote Fork for Serverless Computing

X Wei, F Lu, T Wang, J Gu, Y Yang, R Chen… - … USENIX Symposium on …, 2023 - usenix.org
Serverless platforms essentially face a tradeoff between container startup time and
provisioned concurrency (ie, cached instances), which is further exaggerated by the frequent …

Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu

Q Liu, Y Yang, D Du, Y Xia, P Zhang, J Feng… - 2024 USENIX Annual …, 2024 - usenix.org
Current serverless platforms struggle to optimize resource utilization due to their dynamic
and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall …

SIMPPO: A scalable and incremental online learning framework for serverless resource management

H Qiu, W Mao, A Patke, C Wang, H Franke… - Proceedings of the 13th …, 2022 - dl.acm.org
Serverless Function-as-a-Service (FaaS) offers improved programmability for customers, yet
it is not server-" less" and comes at the cost of more complex infrastructure management (eg …

Parslo: A gradient descent-based approach for near-optimal partial SLO allotment in microservices

A Mirhosseini, S Elnikety, TF Wenisch - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org
Modern cloud services are implemented as graphs of loosely-coupled microservices to
improve programmability, reliability, and scalability. Service Level Objectives (SLOs) define …

Lifting the fog of uncertainties: Dynamic resource orchestration for the containerized cloud

Y Zhang, T Zhang, G Zhang, HA Jacobsen - Proceedings of the 2023 …, 2023 - dl.acm.org
The advances in virtualization technologies have sparked a growing transition from virtual
machine (VM)-based to container-based infrastructure for cloud computing. From the …

AutoMan: Resource-efficient provisioning with tail latency guarantees for microservices

B Cai, B Wang, M Yang, Q Guo - Future Generation Computer Systems, 2023 - Elsevier
Modern user-facing services are progressively evolving from large monolithic applications to
complex graphs of loosely-coupled microservices. While microservice architecture greatly …

TopFull: An Adaptive Top-Down Overload Control for SLO-Oriented Microservices

J Park, J Park, Y Jung, H Lim, H Yeo… - Proceedings of the ACM …, 2024 - dl.acm.org
Microservice has become a de facto standard for building large-scale cloud applications.
Overload control is essential in preventing microservice failures and maintaining system …