Workload consolidation in alibaba clusters: the good, the bad, and the ugly

J Lin, Y Chen, S Gao, Y Lu - Proceedings of the ACM SIGOPS 30th …, 2024 - dl.acm.org

We introduce uProcess, a pure userspace process abstraction that enables CPU cores to be
rescheduled among applications at sub-microsecond timescale without trapping into the …

被引用次数：1 相关文章

[PDF] github.io

Is Machine Learning Necessary for Cloud Resource Usage Forecasting?

G Christofidi, K Papaioannou, TD Doudali - Proceedings of the 2023 …, 2023 - dl.acm.org

Robust forecasts of future resource usage in cloud computing environments enable high
efficiency in resource management solutions, such as autoscaling and overcommitment …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Lifting the fog of uncertainties: Dynamic resource orchestration for the containerized cloud

Y Zhang, T Zhang, G Zhang, HA Jacobsen - Proceedings of the 2023 …, 2023 - dl.acm.org

The advances in virtualization technologies have sparked a growing transition from virtual
machine (VM)-based to container-based infrastructure for cloud computing. From the …

被引用次数：3 相关文章所有 5 个版本

ComboFunc: Joint Resource Combination and Container Placement for Serverless Function Scaling with Heterogeneous Container

Z Wen, Q Chen, Q Deng, Y Niu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Serverless computing provides developers with a maintenance-free approach to resource
usage, but it also transfers resource management responsibility to the cloud platform …

被引用次数：1 相关文章所有 4 个版本

[PDF] vldb.org

DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud

Q Wang, T Lan, Y Tang, B Sang, Z Huang… - Proceedings of the …, 2024 - dl.acm.org

Deep learning recommendation models (DLRM) rely on large embedding tables to manage
categorical sparse features. Expanding such embedding tables can significantly enhance …

Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

Y Zhang, H Yu, C Han, C Wang, B Lu, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN
inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by …

IOGuard: Software-Based I/O Page Fault Handling with One CPU Core

Y Dong, Z Mi - Proceedings of the 15th Asia-Pacific Symposium on …, 2024 - dl.acm.org

Nowadays, device passthrough I/O virtualization technology has played an essential role in
cloud scenarios like network connection. However, the absence of widespread support for …

被引用次数：1 相关文章

APP: Enabling soft real-time execution on densely-populated hybrid memory system

ZW Wu, YC Chen, YH Chang… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org

Memory swapping was considered slow and evil, but swapping to Ultra Low-Latency
storage like Optane has become a promising solution to save power and cost, helping …

被引用次数：1 相关文章

Do Predictors for Resource Overcommitment Even Predict?

G Christofidi, TD Doudali - Proceedings of the 4th Workshop on Machine …, 2024 - dl.acm.org

Resource overcommitment allows datacenters to improve resource efficiency. In this
approach, the system allocates to the users the amount of resources to be most likely used …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting

Z Zhao, Y Hu, Z Gong, G Yang, W Li, X Liu, K Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Advances in deep neural networks (DNNs) have significantly contributed to the development
of real-time video processing applications. Efficient scheduling of DNN workloads in cloud …