Fast core scheduling with userspace process abstraction

J Lin, Y Chen, S Gao, Y Lu - Proceedings of the ACM SIGOPS 30th …, 2024 - dl.acm.org
We introduce uProcess, a pure userspace process abstraction that enables CPU cores to be
rescheduled among applications at sub-microsecond timescale without trapping into the …

Is Machine Learning Necessary for Cloud Resource Usage Forecasting?

G Christofidi, K Papaioannou, TD Doudali - Proceedings of the 2023 …, 2023 - dl.acm.org
Robust forecasts of future resource usage in cloud computing environments enable high
efficiency in resource management solutions, such as autoscaling and overcommitment …

Lifting the fog of uncertainties: Dynamic resource orchestration for the containerized cloud

Y Zhang, T Zhang, G Zhang, HA Jacobsen - Proceedings of the 2023 …, 2023 - dl.acm.org
The advances in virtualization technologies have sparked a growing transition from virtual
machine (VM)-based to container-based infrastructure for cloud computing. From the …

ComboFunc: Joint Resource Combination and Container Placement for Serverless Function Scaling with Heterogeneous Container

Z Wen, Q Chen, Q Deng, Y Niu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Serverless computing provides developers with a maintenance-free approach to resource
usage, but it also transfers resource management responsibility to the cloud platform …

DLRover-RM: Resource Optimization for Deep Recommendation Models Training in the Cloud

Q Wang, T Lan, Y Tang, B Sang, Z Huang… - Proceedings of the …, 2024 - dl.acm.org
Deep learning recommendation models (DLRM) rely on large embedding tables to manage
categorical sparse features. Expanding such embedding tables can significantly enhance …

Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

Y Zhang, H Yu, C Han, C Wang, B Lu, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN
inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by …

IOGuard: Software-Based I/O Page Fault Handling with One CPU Core

Y Dong, Z Mi - Proceedings of the 15th Asia-Pacific Symposium on …, 2024 - dl.acm.org
Nowadays, device passthrough I/O virtualization technology has played an essential role in
cloud scenarios like network connection. However, the absence of widespread support for …

APP: Enabling soft real-time execution on densely-populated hybrid memory system

ZW Wu, YC Chen, YH Chang… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org
Memory swapping was considered slow and evil, but swapping to Ultra Low-Latency
storage like Optane has become a promising solution to save power and cost, helping …

Do Predictors for Resource Overcommitment Even Predict?

G Christofidi, TD Doudali - Proceedings of the 4th Workshop on Machine …, 2024 - dl.acm.org
Resource overcommitment allows datacenters to improve resource efficiency. In this
approach, the system allocates to the users the amount of resources to be most likely used …

Harpagon: Minimizing DNN Serving Cost via Efficient Dispatching, Scheduling and Splitting

Z Zhao, Y Hu, Z Gong, G Yang, W Li, X Liu, K Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Advances in deep neural networks (DNNs) have significantly contributed to the development
of real-time video processing applications. Efficient scheduling of DNN workloads in cloud …