Challenges and opportunities of dnn model execution caching

Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo… - ACM Computing …, 2024 - dl.acm.org

Deep learning (DL) has demonstrated its remarkable success in a wide variety of fields. The
development of a DL model is a time-consuming and resource-intensive procedure. Hence …

被引用次数：18 相关文章所有 4 个版本

Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep Learning

M Sponner, B Waschneck, A Kumar - ACM Computing Surveys, 2024 - dl.acm.org

Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …

被引用次数：7 相关文章

[PDF] arxiv.org

Sparks of GPTs in edge intelligence for metaverse: Caching and inference for mobile AIGC services

M Xu, D Niyato, H Zhang, J Kang, Z Xiong… - arXiv preprint arXiv …, 2023 - arxiv.org

Aiming at achieving artificial general intelligence (AGI) for Metaverse, pretrained foundation
models (PFMs), eg, generative pretrained transformers (GPTs), can effectively provide …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

On the future of cloud engineering

D Bermbach, A Chandra, C Krintz… - … conference on cloud …, 2021 - ieeexplore.ieee.org

Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape
of cloud computing has been undergoing remarkable changes with the emergence of many …

被引用次数：27 相关文章所有 12 个版本

[PDF] arxiv.org

Joint foundation model caching and inference of generative AI services for edge intelligence

M Xu, D Niyato, H Zhang, J Kang… - … 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

With the rapid development of artificial general intelligence (AGI), various multimedia
services based on pretrained foundation models (PFMs) need to be effectively deployed …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Perseus: Characterizing performance and cost of multi-tenant serving for cnn models

M LeMay, S Li, T Guo - 2020 IEEE International Conference on …, 2020 - ieeexplore.ieee.org

Deep learning models are increasingly used for end-user applications, supporting both
novel features such as facial recognition, and traditional features, eg web search. To …

被引用次数：29 相关文章所有 7 个版本

IoT Edge Orchestration for Distributed DNN Service with Containerized Resource Allocation

HB Park, TY Kim, YH Jin, SK Lee - 2022 IEEE 19th Annual …, 2022 - ieeexplore.ieee.org

Increase in demand for intelligent Internet of Things (IoT) services has brought a resource
shortage to the limited resources of edge cloud. We propose to serve intelligent IoT service …

被引用次数：2 相关文章所有 3 个版本

[PDF] utem.edu.my

An Optimal Data Access Framework for Telerehabilitation System

A Muhammed, W Ismail, SN Basarang… - Journal of Advanced …, 2021 - jacta.utem.edu.my

In the telerehabilitation system, the statistical data of the patients' movement are stored in the
temporary storage and synchronised to the storage service of online cloud data. Application …

被引用次数：2 相关文章所有 2 个版本

[PDF] pit-claudel.fr

[PDF][PDF] Nona: A Stochastic Congestion-Aware Job Scheduler for Real-Time Inference Queries

B Pit-Claudel, D Malak, A Cohen, M Medard… - pit-claudel.fr

This paper proposes a novel queueing-theoretic approach to enable stochastic congestion-
aware scheduling for distributed machine learning inference queries. Our proposed …

[PDF][PDF] Nona: A Stochastic Job Scheduling Approach for Latency-Sensitive Online Queries

B Pit-Claudel, D Malak, A Cohen, M Medard… - people.csail.mit.edu

This paper proposes a novel queueing-theoretic approach to enable congestion-aware
scheduling of distributed network-heavy jobs (eg, machine learning inference). Our …