A survey on distributed machine learning

J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …

Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

A field guide to federated optimization

J Wang, Z Charles, Z Xu, G Joshi, HB McMahan… - arXiv preprint arXiv …, 2021 - arxiv.org
Federated learning and analytics are a distributed approach for collaboratively learning
models (or statistics) from decentralized data, motivated by and designed for privacy …

Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning

H Yu, S Yang, S Zhu - Proceedings of the AAAI conference on artificial …, 2019 - ojs.aaai.org
In distributed training of deep neural networks, parallel minibatch SGD is widely used to
speed up the training process by using multiple workers. It uses multiple workers to sample …

A survey on deep learning for big data

Q Zhang, LT Yang, Z Chen, P Li - Information Fusion, 2018 - Elsevier
Deep learning, as one of the most currently remarkable machine learning techniques, has
achieved great success in many applications such as image analysis, speech recognition …

Accurate, large minibatch sgd: Training imagenet in 1 hour

P Goyal, P Dollár, R Girshick, P Noordhuis… - arXiv preprint arXiv …, 2017 - arxiv.org
Deep learning thrives with large neural networks and large datasets. However, larger
networks and larger datasets result in longer training times that impede research and …

Don't use large mini-batches, use local sgd

T Lin, SU Stich, KK Patel, M Jaggi - arXiv preprint arXiv:1808.07217, 2018 - arxiv.org
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …

On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization

H Yu, R Jin, S Yang - International Conference on Machine …, 2019 - proceedings.mlr.press
Recent developments on large-scale distributed machine learning applications, eg, deep
neural networks, benefit enormously from the advances in distributed non-convex …

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019 - proceedings.neurips.cc
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …

On the convergence of local descent methods in federated learning

F Haddadpour, M Mahdavi - arXiv preprint arXiv:1910.14425, 2019 - arxiv.org
In federated distributed learning, the goal is to optimize a global training objective defined
over distributed devices, where the data shard at each device is sampled from a possibly …