Mage: Online and interference-aware scheduling for multi-scale heterogeneous systems

W Lin, C Xiong, W Wu, F Shi, K Li, M Xu - ACM Computing Surveys, 2023 - dl.acm.org

The rapid development of cloud computing with virtualization technology has benefited both
academia and industry. For any cloud data center at scale, one of the primary challenges is …

被引用次数：16 相关文章

[PDF] usenix.org

{INFaaS}: Automated model-less inference serving

F Romero, Q Li, NJ Yadwadkar… - 2021 USENIX Annual …, 2021 - usenix.org

Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

被引用次数：163 相关文章所有 4 个版本

[PDF] upc.edu

Twig: Multi-agent task management for colocated latency-critical cloud services

R Nishtala, V Petrucci, P Carpenter… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Many of the important services running on data centres are latency-critical, time-varying, and
demand strict user satisfaction. Stringent tail-latency targets for colocated services and …

被引用次数：87 相关文章所有 8 个版本

[PDF] github.io

Warehouse-scale video acceleration: co-design and deployment in the wild

P Ranganathan, D Stodolsky, J Calow… - Proceedings of the 26th …, 2021 - dl.acm.org

Video sharing (eg, YouTube, Vimeo, Facebook, TikTok) accounts for the majority of internet
traffic, and video processing is also foundational to several other key workloads (video …

被引用次数：38 相关文章所有 4 个版本

[PDF] stanford.edu

Interference-aware scheduling for inference serving

D Mendoza, F Romero, Q Li, NJ Yadwadkar… - Proceedings of the 1st …, 2021 - dl.acm.org

Machine learning inference applications have proliferated through diverse domains such as
healthcare, security, and analytics. Recent work has proposed inference serving systems for …

被引用次数：27 相关文章所有 2 个版本

[PDF] mit.edu

CuttleSys: Data-driven resource management for interactive services on reconfigurable multicores

N Kulkarni, G Gonzalez-Pumariega… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org

Multi-tenancy for latency-critical applications leads to resource interference and
unpredictable performance. Core reconfiguration opens up more opportunities for …

被引用次数：25 相关文章所有 7 个版本

[PDF] arxiv.org

INFaaS: A model-less and managed inference serving system

F Romero, Q Li, NJ Yadwadkar, C Kozyrakis - arXiv preprint arXiv …, 2019 - arxiv.org

Despite existing work in machine learning inference serving, ease-of-use and cost efficiency
remain challenges at large scales. Developers must manually search through thousands of …

被引用次数：36 相关文章所有 2 个版本

[PDF] ust.hk

Workload consolidation in alibaba clusters: the good, the bad, and the ugly

Y Zhang, Y Yu, W Wang, Q Chen, J Wu… - Proceedings of the 13th …, 2022 - dl.acm.org

Web companies typically run latency-critical long-running services and resource-intensive,
throughput-hungry batch jobs in a shared cluster for improved utilization and reduced cost …

被引用次数：11 相关文章所有 9 个版本

[PDF] ucdavis.edu

Adaptive performance modeling of data-intensive workloads for resource provisioning in virtualized environment

HM Makrani, H Sayadi, N Nazari… - ACM Transactions on …, 2021 - dl.acm.org

The processing of data-intensive workloads is a challenging and time-consuming task that
often requires massive infrastructure to ensure fast data analysis. The cloud platform is the …

被引用次数：18 相关文章所有 2 个版本

RESTRAIN: A dynamic and cost-efficient resource management scheme for addressing performance interference in NFV-based systems

VR Chintapalli, M Adeppady, BR Tamma - Journal of Network and …, 2022 - Elsevier

Abstract Network Functions Virtualization (NFV) replaces the conventional middleboxes by
their software counterparts known as Virtual Network Functions (VNFs) which run on general …

被引用次数：6 相关文章所有 2 个版本