Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

A unified theory of decentralized sgd with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - International …, 2020 - proceedings.mlr.press
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

Decentralized stochastic optimization and gossip algorithms with compressed communication

A Koloskova, S Stich, M Jaggi - International Conference on …, 2019 - proceedings.mlr.press
We consider decentralized stochastic optimization with the objective function (eg data
samples for machine learning tasks) being distributed over n machines that can only …

Fedpd: A federated learning framework with adaptivity to non-iid data

X Zhang, M Hong, S Dhople, W Yin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Federated Learning (FL) is popular for communication-efficient learning from distributed
data. To utilize data at different clients without moving them to the cloud, algorithms such as …

Push–pull gradient methods for distributed optimization in networks

S Pu, W Shi, J Xu, A Nedić - IEEE Transactions on Automatic …, 2020 - ieeexplore.ieee.org
In this article, we focus on solving a distributed convex optimization problem in a network,
where each agent has its own convex cost function and the goal is to minimize the sum of …

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

Z Li, W Shi, M Yan - IEEE Transactions on Signal Processing, 2019 - ieeexplore.ieee.org
This paper proposes a novel proximal-gradient algorithm for a decentralized optimization
problem with a composite objective containing smooth and nonsmooth terms. Specifically …

Accelerated distributed Nesterov gradient descent

G Qu, N Li - IEEE Transactions on Automatic Control, 2019 - ieeexplore.ieee.org
This paper considers the distributed optimization problem over a network, where the
objective is to optimize a global function formed by a sum of local functions, using only local …

Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data

T Lin, SP Karimireddy, SU Stich, M Jaggi - arXiv preprint arXiv:2102.04761, 2021 - arxiv.org
Decentralized training of deep learning models is a key element for enabling data privacy
and on-device learning over networks. In realistic learning scenarios, the presence of …

A general framework for decentralized optimization with first-order methods

R Xin, S Pu, A Nedić, UA Khan - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Decentralized optimization to minimize a finite sum of functions, distributed over a network of
nodes, has been a significant area within control and signal-processing research due to its …

Optimal algorithms for non-smooth distributed optimization in networks

K Scaman, F Bach, S Bubeck… - Advances in Neural …, 2018 - proceedings.neurips.cc
In this work, we consider the distributed optimization of non-smooth convex functions using a
network of computing units. We investigate this problem under two regularity …