Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing

S Choi, S Lee, Y Kim, J Park, Y Kwon… - 2022 USENIX Annual …, 2022 - usenix.org
As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …

INFless: a native serverless system for low-latency, high-throughput inference

Y Yang, L Zhao, Y Li, H Zhang, J Li, M Zhao… - Proceedings of the 27th …, 2022 - dl.acm.org
Modern websites increasingly rely on machine learning (ML) to improve their business
efficiency. Developing and maintaining ML services incurs high costs for developers …

Asyfunc: A high-performance and resource-efficient serverless inference system via asymmetric functions

Q Pei, Y Yuan, H Hu, Q Chen, F Liu - … of the 2023 ACM Symposium on …, 2023 - dl.acm.org
Recent advances in deep learning (DL) have spawned various intelligent cloud services
with well-trained DL models. Nevertheless, it is nontrivial to maintain the desired end-to-end …

Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters

B Li, T Patel, S Samsi, V Gadepally… - Proceedings of the 13th …, 2022 - dl.acm.org
GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …

HiTDL: High-throughput deep learning inference at the hybrid mobile edge

J Wu, L Wang, Q Pei, X Cui, F Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Deep neural networks (DNNs) have become a critical component for inference in modern
mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile-and …

A survey of multi-tenant deep learning inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, C Liu… - arXiv preprint arXiv …, 2022 - arxiv.org
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …

Model-driven cluster resource management for ai workloads in edge clouds

Q Liang, WA Hanafy, A Ali-Eldin, P Shenoy - ACM Transactions on …, 2023 - dl.acm.org
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented
reality have tight latency constraints, hardware AI accelerators have been recently proposed …

SLAM-share: visual simultaneous localization and mapping for real-time multi-user augmented reality

A Dhakal, X Ran, Y Wang, J Chen… - Proceedings of the 18th …, 2022 - dl.acm.org
Augmented reality (AR) devices perform visual simultaneous localization and mapping
(SLAM) to map the real world and localize themselves in it, enabling them to render the …