Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows

Z Zhou, Y Zhang, C Delimitrou - Proceedings of the 28th ACM …, 2022 - dl.acm.org
Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …

On the opportunities of green computing: A survey

Y Zhou, X Lin, X Zhang, M Wang, G Jiang, H Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial Intelligence (AI) has achieved significant advancements in technology and research
with the development over several decades, and is widely used in many areas including …

Adaptive QoS-aware microservice deployment with excessive loads via intra-and inter-datacenter scheduling

J Shi, K Fu, J Wang, Q Chen, D Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
User-facing applications often experience excessive loads and are shifting towards the
microservice architecture. To fully utilize heterogeneous resources, current datacenters have …

Ribbon: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

B Li, RB Roy, T Patel, V Gadepally, K Gettings… - Proceedings of the …, 2021 - dl.acm.org
Deep learning model inference is a key service in many businesses and scientific discovery
processes. This paper introduces Ribbon, a novel deep learning inference serving system …

Qos-awareness of microservices with excessive loads via inter-datacenter scheduling

J Shi, J Wang, K Fu, Q Chen, D Zeng… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
User-facing applications often experience excessive loads and are shifting towards
microservice software architecture. While the local datacenter may not have enough …

Pac: Preference-aware co-location scheduling on heterogeneous numa architectures to improve resource utilization

P Pang, Y Li, B Liu, Q Chen, Z Yu, Z Yu… - Proceedings of the 37th …, 2023 - dl.acm.org
Latency-critical applications directly interact with end users and often experience the diurnal
load pattern. In production, best-effort applications are often co-located with them to utilize …

Provisioning differentiated last-level cache allocations to vms in public clouds

M Shahrad, S Elnikety, R Bianchini - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org
Public cloud providers offer access to hardware resources and users rent resources by
choosing among many VM sizes. While users choose the CPU core count and main memory …

Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF

M Rezvani, A Jahanshahi… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
This paper explores a novel server observability approach using eBPF (extended Berkeley
Packet Filter) for detailed request-level performance metrics of data center latency-sensitive …

Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning

K Cheng, Z Wang, W Hu, T Yang, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org
A service-level objective (SLO) is a target performance metric of service that cloud vendors
aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the …

Orchid: An Online Learning based Resource Partitioning Framework for Job Colocation with Multiple Objectives

R Chen, W Peng, Y Li, X Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Colocating multiple throughput-oriented jobs on the same server is a commonly used
approach for improving system throughput in modern datacenters. The shared resources of …