Satori: efficient and fair resource partitioning by sacrificing short-term benefits for long-term...

Z Zhou, Y Zhang, C Delimitrou - Proceedings of the 28th ACM …, 2022 - dl.acm.org

Multi-stage serverless applications, ie, workflows with many computation and I/O stages, are
becoming increasingly representative of FaaS platforms. Despite their advantages in terms …

被引用次数：47 相关文章所有 5 个版本

[PDF] arxiv.org

On the opportunities of green computing: A survey

Y Zhou, X Lin, X Zhang, M Wang, G Jiang, H Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

Artificial Intelligence (AI) has achieved significant advancements in technology and research
with the development over several decades, and is widely used in many areas including …

被引用次数：17 相关文章所有 2 个版本

Adaptive QoS-aware microservice deployment with excessive loads via intra-and inter-datacenter scheduling

J Shi, K Fu, J Wang, Q Chen, D Zeng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

User-facing applications often experience excessive loads and are shifting towards the
microservice architecture. To fully utilize heterogeneous resources, current datacenters have …

被引用次数：5 相关文章所有 4 个版本

[PDF] acm.org

Ribbon: cost-effective and qos-aware deep learning model inference using a diverse pool of cloud computing instances

B Li, RB Roy, T Patel, V Gadepally, K Gettings… - Proceedings of the …, 2021 - dl.acm.org

Deep learning model inference is a key service in many businesses and scientific discovery
processes. This paper introduces Ribbon, a novel deep learning inference serving system …

被引用次数：21 相关文章所有 7 个版本

[PDF] google.com

Qos-awareness of microservices with excessive loads via inter-datacenter scheduling

J Shi, J Wang, K Fu, Q Chen, D Zeng… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

User-facing applications often experience excessive loads and are shifting towards
microservice software architecture. While the local datacenter may not have enough …

被引用次数：15 相关文章所有 3 个版本

Pac: Preference-aware co-location scheduling on heterogeneous numa architectures to improve resource utilization

P Pang, Y Li, B Liu, Q Chen, Z Yu, Z Yu… - Proceedings of the 37th …, 2023 - dl.acm.org

Latency-critical applications directly interact with end users and often experience the diurnal
load pattern. In production, best-effort applications are often co-located with them to utilize …

被引用次数：4 相关文章

Provisioning differentiated last-level cache allocations to vms in public clouds

M Shahrad, S Elnikety, R Bianchini - … of the ACM Symposium on Cloud …, 2021 - dl.acm.org

Public cloud providers offer access to hardware resources and users rent resources by
choosing among many VM sizes. While users choose the CPU core count and main memory …

被引用次数：15 相关文章

[PDF] danielwong.org

Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF

M Rezvani, A Jahanshahi… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

This paper explores a novel server observability approach using eBPF (extended Berkeley
Packet Filter) for detailed request-level performance metrics of data center latency-sensitive …

被引用次数：1 相关文章

[PDF] arxiv.org

Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning

K Cheng, Z Wang, W Hu, T Yang, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org

A service-level objective (SLO) is a target performance metric of service that cloud vendors
aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the …

被引用次数：1 相关文章所有 3 个版本

Orchid: An Online Learning based Resource Partitioning Framework for Job Colocation with Multiple Objectives

R Chen, W Peng, Y Li, X Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Colocating multiple throughput-oriented jobs on the same server is a commonly used
approach for improving system throughput in modern datacenters. The shared resources of …

被引用次数：3 相关文章所有 4 个版本