A survey on distributed machine learning
J Verbraeken, M Wolting, J Katzy… - Acm computing surveys …, 2020 - dl.acm.org
The demand for artificial intelligence has grown significantly over the past decade, and this
growth has been fueled by advances in machine learning techniques and the ability to …
growth has been fueled by advances in machine learning techniques and the ability to …
Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
A field guide to federated optimization
Federated learning and analytics are a distributed approach for collaboratively learning
models (or statistics) from decentralized data, motivated by and designed for privacy …
models (or statistics) from decentralized data, motivated by and designed for privacy …
Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning
In distributed training of deep neural networks, parallel minibatch SGD is widely used to
speed up the training process by using multiple workers. It uses multiple workers to sample …
speed up the training process by using multiple workers. It uses multiple workers to sample …
A survey on deep learning for big data
Deep learning, as one of the most currently remarkable machine learning techniques, has
achieved great success in many applications such as image analysis, speech recognition …
achieved great success in many applications such as image analysis, speech recognition …
Accurate, large minibatch sgd: Training imagenet in 1 hour
Deep learning thrives with large neural networks and large datasets. However, larger
networks and larger datasets result in longer training times that impede research and …
networks and larger datasets result in longer training times that impede research and …
Don't use large mini-batches, use local sgd
Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …
On the linear speedup analysis of communication efficient momentum SGD for distributed non-convex optimization
Recent developments on large-scale distributed machine learning applications, eg, deep
neural networks, benefit enormously from the advances in distributed non-convex …
neural networks, benefit enormously from the advances in distributed non-convex …
Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …
optimization of large-scale learning models. Recently, several approaches to mitigate this …
On the convergence of local descent methods in federated learning
F Haddadpour, M Mahdavi - arXiv preprint arXiv:1910.14425, 2019 - arxiv.org
In federated distributed learning, the goal is to optimize a global training objective defined
over distributed devices, where the data shard at each device is sampled from a possibly …
over distributed devices, where the data shard at each device is sampled from a possibly …