A survey of multi-tenant deep learning inference on gpu
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
One-shot tuner for deep learning compilers
Auto-tuning DL compilers are gaining ground as an optimizing back-end for DL frameworks.
While existing work can generate deep learning models that exceed the performance of …
While existing work can generate deep learning models that exceed the performance of …
Metatune: Meta-learning based cost model for fast and efficient auto-tuning frameworks
J Ryu, H Sung - arXiv preprint arXiv:2102.04199, 2021 - arxiv.org
Deep learning compiler frameworks are gaining ground as a more portable back-end for
deep learning applications on increasingly diverse hardware. However, they face the …
deep learning applications on increasingly diverse hardware. However, they face the …
Exploring the Diversity of Multiple Job Deployments over GPUs for Efficient Resource Sharing
Graphic Processing Units (GPUs) are gradually becoming mainstream computing resource
for efficient execution of applications both on-premises and in the cloud. Currently however …
for efficient execution of applications both on-premises and in the cloud. Currently however …
Slice-tune: A system for high performance dnn autotuning
Autotuning DNN models prior to their deployment is an essential but time-consuming task.
Using expensive (and power-hungry) GPU and TPU accelerators efficiently is also key …
Using expensive (and power-hungry) GPU and TPU accelerators efficiently is also key …
Job Recommendation Service for GPU Sharing in Kubernetes
Cloud infrastructures encourage the multi-tenancy of hardware resources. User-defined
Machine Learning (ML) training jobs are offloaded to the cloud for efficient training. State-of …
Machine Learning (ML) training jobs are offloaded to the cloud for efficient training. State-of …
Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUs
Graphic Processing Units (GPUs) are known for the large computing capabilities they offer
users compared to traditional CPUs. However, the issue of resource under-utilization is …
users compared to traditional CPUs. However, the issue of resource under-utilization is …
[图书][B] Cooperative Design of Machine Learning and GPU-Based Systems for Inference
A Dhakal - 2022 - search.proquest.com
Our work seeks to improve and adapt computing systems and machine learning (ML)
algorithms to match each other's requirements and capabilities. Due to the high …
algorithms to match each other's requirements and capabilities. Due to the high …