Scalable training of deep learning machines by incremental block training with intra-block...

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org

The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …

被引用次数：796 相关文章所有 9 个版本

[PDF] arxiv.org

Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org

Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

被引用次数：791 相关文章所有 28 个版本

[PDF] kaust.edu.sa

A field guide to federated optimization

J Wang, Z Charles, Z Xu, G Joshi, HB McMahan… - arXiv preprint arXiv …, 2021 - arxiv.org

Federated learning and analytics are a distributed approach for collaboratively learning
models (or statistics) from decentralized data, motivated by and designed for privacy …

被引用次数：330 相关文章所有 5 个版本

[PDF] aaai.org

Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning

H Yu, S Yang, S Zhu - Proceedings of the AAAI conference on artificial …, 2019 - ojs.aaai.org

In distributed training of deep neural networks, parallel minibatch SGD is widely used to
speed up the training process by using multiple workers. It uses multiple workers to sample …

被引用次数：591 相关文章所有 9 个版本

[PDF] fardapaper.ir

A survey on deep learning for big data

Q Zhang, LT Yang, Z Chen, P Li - Information Fusion, 2018 - Elsevier

Deep learning, as one of the most currently remarkable machine learning techniques, has
achieved great success in many applications such as image analysis, speech recognition …

被引用次数：1248 相关文章所有 6 个版本

[PDF] arxiv.org

Accurate, large minibatch sgd: Training imagenet in 1 hour

P Goyal, P Dollár, R Girshick, P Noordhuis… - arXiv preprint arXiv …, 2017 - arxiv.org

Deep learning thrives with large neural networks and large datasets. However, larger
networks and larger datasets result in longer training times that impede research and …

被引用次数：3936 相关文章所有 7 个版本

[PDF] arxiv.org

Don't use large mini-batches, use local sgd

T Lin, SU Stich, KK Patel, M Jaggi - arXiv preprint arXiv:1808.07217, 2018 - arxiv.org

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …

被引用次数：471 相关文章所有 9 个版本

[PDF] mlr.press

On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization

H Yu, R Jin, S Yang - International Conference on Machine …, 2019 - proceedings.mlr.press

Recent developments on large-scale distributed machine learning applications, eg, deep
neural networks, benefit enormously from the advances in distributed non-convex …

被引用次数：382 相关文章所有 5 个版本

[PDF] neurips.cc

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc

Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

被引用次数：333 相关文章所有 11 个版本

[PDF] arxiv.org

On the convergence of local descent methods in federated learning

F Haddadpour, M Mahdavi - arXiv preprint arXiv:1910.14425, 2019 - arxiv.org

In federated distributed learning, the goal is to optimize a global training objective defined
over distributed devices, where the data shard at each device is sampled from a possibly …

被引用次数：268 相关文章所有 3 个版本