Aquatope: Qos-and-uncertainty-aware resource management for multi-stage serverless workflows
Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …
On the opportunities of green computing: A survey
Artificial Intelligence (AI) has achieved significant advancements in technology and research
with the development over several decades, and is widely used in many areas including …
with the development over several decades, and is widely used in many areas including …
Adaptive QoS-aware microservice deployment with excessive loads via intra-and inter-datacenter scheduling
User-facing applications often experience excessive loads and are shifting towards the
microservice architecture. To fully utilize heterogeneous resources, current datacenters have …
microservice architecture. To fully utilize heterogeneous resources, current datacenters have …
Ribbon: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances
Deep learning model inference is a key service in many businesses and scientific discovery
processes. This paper introduces Ribbon, a novel deep learning inference serving system …
processes. This paper introduces Ribbon, a novel deep learning inference serving system …
Qos-awareness of microservices with excessive loads via inter-datacenter scheduling
User-facing applications often experience excessive loads and are shifting towards
microservice software architecture. While the local datacenter may not have enough …
microservice software architecture. While the local datacenter may not have enough …
Pac: Preference-aware co-location scheduling on heterogeneous numa architectures to improve resource utilization
P Pang, Y Li, B Liu, Q Chen, Z Yu, Z Yu… - Proceedings of the 37th …, 2023 - dl.acm.org
Latency-critical applications directly interact with end users and often experience the diurnal
load pattern. In production, best-effort applications are often co-located with them to utilize …
load pattern. In production, best-effort applications are often co-located with them to utilize …
Provisioning differentiated last-level cache allocations to vms in public clouds
Public cloud providers offer access to hardware resources and users rent resources by
choosing among many VM sizes. While users choose the CPU core count and main memory …
choosing among many VM sizes. While users choose the CPU core count and main memory …
Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF
M Rezvani, A Jahanshahi… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
This paper explores a novel server observability approach using eBPF (extended Berkeley
Packet Filter) for detailed request-level performance metrics of data center latency-sensitive …
Packet Filter) for detailed request-level performance metrics of data center latency-sensitive …
Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning
A service-level objective (SLO) is a target performance metric of service that cloud vendors
aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the …
aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the …
Orchid: An Online Learning based Resource Partitioning Framework for Job Colocation with Multiple Objectives
Colocating multiple throughput-oriented jobs on the same server is a commonly used
approach for improving system throughput in modern datacenters. The shared resources of …
approach for improving system throughput in modern datacenters. The shared resources of …