Distributed artificial intelligence empowered by end-edge-cloud computing: A survey
As the computing paradigm shifts from cloud computing to end-edge-cloud computing, it
also supports artificial intelligence evolving from a centralized manner to a distributed one …
also supports artificial intelligence evolving from a centralized manner to a distributed one …
Efficient deep learning: A survey on making deep learning models smaller, faster, and better
G Menghani - ACM Computing Surveys, 2023 - dl.acm.org
Deep learning has revolutionized the fields of computer vision, natural language
understanding, speech recognition, information retrieval, and more. However, with the …
understanding, speech recognition, information retrieval, and more. However, with the …
Sparsegpt: Massive language models can be accurately pruned in one-shot
E Frantar, D Alistarh - International Conference on Machine …, 2023 - proceedings.mlr.press
We show for the first time that large-scale generative pretrained transformer (GPT) family
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …
models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal …
Depgraph: Towards any structural pruning
Structural pruning enables model acceleration by removing structurally-grouped parameters
from neural networks. However, the parameter-grouping patterns vary widely across …
from neural networks. However, the parameter-grouping patterns vary widely across …
Patch diffusion: Faster and more data-efficient training of diffusion models
Diffusion models are powerful, but they require a lot of time and data to train. We propose
Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training …
Patch Diffusion, a generic patch-wise training framework, to significantly reduce the training …
On-device training under 256kb memory
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …
Optimal brain compression: A framework for accurate post-training quantization and pruning
E Frantar, D Alistarh - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider the problem of model compression for deep neural networks (DNNs) in the
challenging one-shot/post-training setting, in which we are given an accurate trained model …
challenging one-shot/post-training setting, in which we are given an accurate trained model …
A survey of quantization methods for efficient neural network inference
This chapter provides approaches to the problem of quantizing the numerical values in deep
Neural Network computations, covering the advantages/disadvantages of current methods …
Neural Network computations, covering the advantages/disadvantages of current methods …
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …
reduce the size of neural networks by selectively pruning components. Similarly to their …
Efficientvit: Lightweight multi-scale attention for high-resolution dense prediction
High-resolution dense prediction enables many appealing real-world applications, such as
computational photography, autonomous driving, etc. However, the vast computational cost …
computational photography, autonomous driving, etc. However, the vast computational cost …