Distributed artificial intelligence empowered by end-edge-cloud computing: A survey

S Duan, D Wang, J Ren, F Lyu, Y Zhang… - … Surveys & Tutorials, 2022 - ieeexplore.ieee.org
As the computing paradigm shifts from cloud computing to end-edge-cloud computing, it
also supports artificial intelligence evolving from a centralized manner to a distributed one …

Efficient deep learning: A survey on making deep learning models smaller, faster, and better

G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …

Sparsegpt: Massive language models can be accurately pruned in one-shot

E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press
We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …

Depgraph: Towards any structural pruning

G Fang, X Ma, M Song, MB Mi… - Proceedings of the …, 2023 - openaccess.thecvf.com
Structural pruning enables model acceleration by removing structurally-grouped parameters
from neural networks. However, the parameter-grouping patterns vary widely across …

Patch diffusion: Faster and more data-efficient training of diffusion models

Z Wang, Y Jiang, H Zheng, P Wang… - Advances in neural …, 2024 - proceedings.neurips.cc
Diffusion models are powerful, but they require a lot of time and data to train. We propose
Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training …

On-device training under 256kb memory

J Lin, L Zhu, WM Chen, WC Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …

Optimal brain compression: A framework for accurate post-training quantization and pruning

E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …

A survey of quantization methods for efficient neural network inference

A Gholami, S Kim, Z Dong, Z Yao… - Low-Power Computer …, 2022 - taylorfrancis.com
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction

H Cai, J Li, M Hu, C Gan, S Han - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
High-resolution dense prediction enables many appealing real-world applications, such as
computational photography, autonomous driving, etc. However, the vast computational cost …