Deep learning workload scheduling in gpu datacenters: A survey
Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
development of a DL model is a time-consuming and resource-intensive procedure. Hence …
Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning
Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …
circumstances at runtime to improve the resource footprint while maintaining the model's …
Sparks of GPTs in edge intelligence for metaverse: Caching and inference for mobile AIGC services
Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation
models (PFMs), eg, generative pretrained transformers (GPTs), can effectively provide …
models (PFMs), eg, generative pretrained transformers (GPTs), can effectively provide …
On the future of cloud engineering
D Bermbach, A Chandra, C Krintz… - … conference on cloud …, 2021 - ieeexplore.ieee.org
Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape
of cloud computing has been undergoing remarkable changes with the emergence of many …
of cloud computing has been undergoing remarkable changes with the emergence of many …
Joint foundation model caching and inference of generative AI services for edge intelligence
With the rapid development of artificial general intelligence (AGI), various multimedia
services based on pretrained foundation models (PFMs) need to be effectively deployed …
services based on pretrained foundation models (PFMs) need to be effectively deployed …
Perseus: Characterizing performance and cost of multi-tenant serving for cnn models
M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org
Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …
novel features such as facial recognition, and traditional features, eg web search. To …
IoT Edge Orchestration for Distributed DNN Service with Containerized Resource Allocation
HB Park, TY Kim, YH Jin, SK Lee - 2022 IEEE 19th Annual …, 2022 - ieeexplore.ieee.org
Increase in demand for intelligent Internet of Things (IoT) services has brought a resource
shortage to the limited resources of edge cloud. We propose to serve intelligent IoT service …
shortage to the limited resources of edge cloud. We propose to serve intelligent IoT service …
An Optimal Data Access Framework for Telerehabilitation System
A Muhammed, W Ismail, SN Basarang… - Journal of Advanced …, 2021 - jacta.utem.edu.my
In the telerehabilitation system, the statistical data of the patients' movement are stored in the
temporary storage and synchronised to the storage service of online cloud data. Application …
temporary storage and synchronised to the storage service of online cloud data. Application …
[PDF][PDF] Nona: A Stochastic Congestion-Aware Job Scheduler for Real-Time Inference Queries
This paper proposes a novel queueing-theoretic approach to enable stochastic congestion-
aware scheduling for distributed machine learning inference queries. Our proposed …
aware scheduling for distributed machine learning inference queries. Our proposed …
[PDF][PDF] Nona: A Stochastic Job Scheduling Approach for Latency-Sensitive Online Queries
This paper proposes a novel queueing-theoretic approach to enable congestion-aware
scheduling of distributed network-heavy jobs (eg, machine learning inference). Our …
scheduling of distributed network-heavy jobs (eg, machine learning inference). Our …