Interference-aware scheduling for inference serving

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of multi-tenant deep learning inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, C Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Cloud-native computing: A survey from the perspective of services

S Deng, H Zhao, B Huang, C Zhang… - Proceedings of the …, 2024 - ieeexplore.ieee.org

The development of cloud computing delivery models inspires the emergence of cloud-
native computing. Cloud-native computing, as the most influential development principle for …

被引用次数：5 相关文章所有 9 个版本

[PDF] polimi.it

Performance prediction of deep learning applications training in GPU as a service systems

M Lattuada, E Gianniti, D Ardagna, L Zhang - Cluster Computing, 2022 - Springer

Data analysts predict that the GPU as a service (GPUaaS) market will grow to support 3D
models, animated video processing, gaming, and deep learning model training. The main …

被引用次数：19 相关文章所有 5 个版本

[PDF] u-aizu.ac.jp

Tcb: Accelerating transformer inference services with request concatenation

B Fu, F Chen, P Li, D Zeng - … of the 51st International Conference on …, 2022 - dl.acm.org

Transformer has dominated the field of natural language processing because of its strong
capability in learning from sequential input data. In recent years, various computing and …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Miriam: Exploiting elastic kernels for real-time multi-dnn inference on edge gpu

Z Zhao, N Ling, N Guan, G Xing - … of the 21st ACM Conference on …, 2023 - dl.acm.org

Many applications such as autonomous driving and augmented reality, require the
concurrent running of multiple deep neural networks (DNN) that poses different levels of real …

被引用次数：8 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Affinity-aware resource provisioning for long-running applications in shared clusters

C Mommessin, R Yang, NV Shakhlevich, X Sun… - Journal of Parallel and …, 2023 - Elsevier

Resource provisioning plays a pivotal role in determining the right amount of infrastructure
resource to run applications and reduce the monetary cost. A significant portion of …

被引用次数：8 相关文章所有 9 个版本

Iris: interference and resource aware predictive orchestration for ml inference serving

A Ferikoglou, P Chrysomeris… - 2023 IEEE 16th …, 2023 - ieeexplore.ieee.org

Over the last years, the ever-growing number of Machine Learning (ML) and Artificial
Intelligence (AI) applications deployed in the Cloud has led to high demands on the …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large-scale deep learning serving system optimization: Challenges and opportunities

F Yu, D Wang, L Shangguan, M Zhang, X Tang… - arXiv preprint arXiv …, 2021 - arxiv.org

Deep Learning (DL) models have achieved superior performance in many application
domains, including vision, language, medical, commercial ads, entertainment, etc. With the …

被引用次数：12 相关文章所有 3 个版本