Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
A survey of multi-tenant deep learning inference on gpu
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
Cloud-native computing: A survey from the perspective of services
The development of cloud computing delivery models inspires the emergence of cloud-
native computing. Cloud-native computing, as the most influential development principle for …
native computing. Cloud-native computing, as the most influential development principle for …
Performance prediction of deep learning applications training in GPU as a service systems
Data analysts predict that the GPU as a service (GPUaaS) market will grow to support 3D
models, animated video processing, gaming, and deep learning model training. The main …
models, animated video processing, gaming, and deep learning model training. The main …
Tcb: Accelerating transformer inference services with request concatenation
Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …
capability in learning from sequential input data. In recent years, various computing and …
Miriam: Exploiting elastic kernels for real-time multi-dnn inference on edge gpu
Many applications such as autonomous driving and augmented reality, require the
concurrent running of multiple deep neural networks (DNN) that poses different levels of real …
concurrent running of multiple deep neural networks (DNN) that poses different levels of real …
[HTML][HTML] Affinity-aware resource provisioning for long-running applications in shared clusters
Resource provisioning plays a pivotal role in determining the right amount of infrastructure
resource to run applications and reduce the monetary cost. A significant portion of …
resource to run applications and reduce the monetary cost. A significant portion of …
Iris: interference and resource aware predictive orchestration for ml inference serving
A Ferikoglou, P Chrysomeris… - 2023 IEEE 16th …, 2023 - ieeexplore.ieee.org
Over the last years, the ever-growing number of Machine Learning (ML) and Artificial
Intelligence (AI) applications deployed in the Cloud has led to high demands on the …
Intelligence (AI) applications deployed in the Cloud has led to high demands on the …
A survey of large-scale deep learning serving system optimization: Challenges and opportunities
Deep Learning (DL) models have achieved superior performance in many application
domains, including vision, language, medical, commercial ads, entertainment, etc. With the …
domains, including vision, language, medical, commercial ads, entertainment, etc. With the …