Sparse connection and pruning in large dynamic artificial neural networks

E Covi, E Donati, X Liang, D Kappel… - Frontiers in …, 2021 - frontiersin.org

Wearable devices are a fast-growing technology with impact on personal healthcare for both
society and economy. Due to the widespread of sensors in pervasive and distributed …

被引用次数：122 相关文章所有 21 个版本

[PDF] arxiv.org

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

被引用次数：138 相关文章所有 4 个版本

[PDF] jmlr.org

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org

The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

被引用次数：736 相关文章所有 27 个版本

[PDF] mlr.press

Rigging the lottery: Making all tickets winners

U Evci, T Gale, J Menick, PS Castro… - … on machine learning, 2020 - proceedings.mlr.press

Many applications require sparse neural networks due to space or inference time
restrictions. There is a large body of work on training dense networks to yield sparse …

被引用次数：570 相关文章所有 9 个版本

[PDF] researchgate.net

The state of sparsity in deep neural networks

T Gale, E Elsen, S Hooker - arXiv preprint arXiv:1902.09574, 2019 - arxiv.org

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural
networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to …

被引用次数：776 相关文章所有 6 个版本

[PDF] arxiv.org

Accelerating sparse deep neural networks

A Mishra, JA Latorre, J Pool, D Stosic, D Stosic… - arXiv preprint arXiv …, 2021 - arxiv.org

As neural network model sizes have dramatically increased, so has the interest in various
techniques to reduce their parameter counts and accelerate their execution. An active area …

被引用次数：199 相关文章所有 2 个版本

[PDF] mlr.press

Train big, then compress: Rethinking model size for efficient training and inference of transformers

Z Li, E Wallace, S Shen, K Lin… - International …, 2020 - proceedings.mlr.press

Since hardware resources are limited, the objective of training deep learning models is
typically to maximize accuracy subject to the time and memory constraints of training and …

被引用次数：288 相关文章所有 11 个版本

[HTML] amazon.science

[HTML][HTML] Scalable distributed DNN training using commodity GPU cloud computing

N Ström - 2015 - amazon.science

We introduce a new method for scaling up distributed Stochastic Gradient Descent (SGD)
training of Deep Neural Networks (DNN). The method solves the well-known communication …

被引用次数：657 相关文章所有 10 个版本

[PDF] neurips.cc

Top-kast: Top-k always sparse training

S Jayakumar, R Pascanu, J Rae… - Advances in Neural …, 2020 - proceedings.neurips.cc

Sparse neural networks are becoming increasingly important as the field seeks to improve
the performance of existing models by scaling them up, while simultaneously trying to …

被引用次数：97 相关文章所有 6 个版本

[PDF] neurips.cc

Powerpropagation: A sparsity inducing weight reparameterisation

J Schwarz, S Jayakumar, R Pascanu… - Advances in neural …, 2021 - proceedings.neurips.cc

The training of sparse neural networks is becoming an increasingly important tool for
reducing the computational footprint of models at training and evaluation, as well enabling …

被引用次数：55 相关文章所有 7 个版本