Deep learning workload scheduling in gpu datacenters: A survey

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning

M Sponner, B Waschneck, A Kumar - ACM Computing Surveys, 2024 - dl.acm.org
Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …

Sparks of GPTs in edge intelligence for metaverse: Caching and inference for mobile AIGC services

M Xu, D Niyato, H Zhang, J Kang, Z Xiong… - arXiv preprint arXiv …, 2023 - arxiv.org
Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation
models (PFMs), eg, generative pretrained transformers (GPTs), can effectively provide …

On the future of cloud engineering

D Bermbach, A Chandra, C Krintz… - … conference on cloud …, 2021 - ieeexplore.ieee.org
Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape
of cloud computing has been undergoing remarkable changes with the emergence of many …

Joint foundation model caching and inference of generative AI services for edge intelligence

M Xu, D Niyato, H Zhang, J Kang… - … 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
With the rapid development of artificial general intelligence (AGI), various multimedia
services based on pretrained foundation models (PFMs) need to be effectively deployed …

Perseus: Characterizing performance and cost of multi-tenant serving for cnn models

M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …

IoT Edge Orchestration for Distributed DNN Service with Containerized Resource Allocation

HB Park, TY Kim, YH Jin, SK Lee - 2022 IEEE 19th Annual …, 2022 - ieeexplore.ieee.org
Increase in demand for intelligent Internet of Things (IoT) services has brought a resource
shortage to the limited resources of edge cloud. We propose to serve intelligent IoT service …

An Optimal Data Access Framework for Telerehabilitation System

A Muhammed, W Ismail, SN Basarang… - Journal of Advanced …, 2021 - jacta.utem.edu.my
In the telerehabilitation system, the statistical data of the patients' movement are stored in the
temporary storage and synchronised to the storage service of online cloud data. Application …

[PDF][PDF] Nona: A Stochastic Congestion-Aware Job Scheduler for Real-Time Inference Queries

B Pit-Claudel, D Malak, A Cohen, M Medard… - pit-claudel.fr
This paper proposes a novel queueing-theoretic approach to enable stochastic congestion-
aware scheduling for distributed machine learning inference queries. Our proposed …

[PDF][PDF] Nona: A Stochastic Job Scheduling Approach for Latency-Sensitive Online Queries

B Pit-Claudel, D Malak, A Cohen, M Medard… - people.csail.mit.edu
This paper proposes a novel queueing-theoretic approach to enable congestion-aware
scheduling of distributed network-heavy jobs (eg, machine learning inference). Our …