Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
Serving heterogeneous machine learning models on {Multi-GPU} servers with {Spatio-Temporal} sharing
As machine learning (ML) techniques are applied to a widening range of applications, high
throughput ML inference serving has become critical for online services. Such ML inference …
throughput ML inference serving has become critical for online services. Such ML inference …
INFless: a native serverless system for low-latency, high-throughput inference
Modern websites increasingly rely on machine learning (ML) to improve their business
efficiency. Developing and maintaining ML services incurs high costs for developers …
efficiency. Developing and maintaining ML services incurs high costs for developers …
Asyfunc: A high-performance and resource-efficient serverless inference system via asymmetric functions
Recent advances in deep learning (DL) have spawned various intelligent cloud services
with well-trained DL models. Nevertheless, it is nontrivial to maintain the desired end-to-end …
with well-trained DL models. Nevertheless, it is nontrivial to maintain the desired end-to-end …
Miso: exploiting multi-instance gpu capability on multi-tenant gpu clusters
GPU technology has been improving at an expedited pace in terms of size and performance,
empowering HPC and AI/ML researchers to advance the scientific discovery process …
empowering HPC and AI/ML researchers to advance the scientific discovery process …
HiTDL: High-throughput deep learning inference at the hybrid mobile edge
Deep neural networks (DNNs) have become a critical component for inference in modern
mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile-and …
mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile-and …
A survey of multi-tenant deep learning inference on gpu
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
Model-driven cluster resource management for ai workloads in edge clouds
Since emerging edge applications such as Internet of Things (IoT) analytics and augmented
reality have tight latency constraints, hardware AI accelerators have been recently proposed …
reality have tight latency constraints, hardware AI accelerators have been recently proposed …
SLAM-share: visual simultaneous localization and mapping for real-time multi-user augmented reality
Augmented reality (AR) devices perform visual simultaneous localization and mapping
(SLAM) to map the real world and localize themselves in it, enabling them to render the …
(SLAM) to map the real world and localize themselves in it, enabling them to render the …