Communication-efficient distributed deep learning: A comprehensive survey
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
A unified theory of decentralized sgd with changing topology and local updates
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …
Decentralized stochastic optimization and gossip algorithms with compressed communication
We consider decentralized stochastic optimization with the objective function (eg data
samples for machine learning tasks) being distributed over n machines that can only …
samples for machine learning tasks) being distributed over n machines that can only …
Fedpd: A federated learning framework with adaptivity to non-iid data
Federated Learning (FL) is popular for communication-efficient learning from distributed
data. To utilize data at different clients without moving them to the cloud, algorithms such as …
data. To utilize data at different clients without moving them to the cloud, algorithms such as …
Push–pull gradient methods for distributed optimization in networks
In this article, we focus on solving a distributed convex optimization problem in a network,
where each agent has its own convex cost function and the goal is to minimize the sum of …
where each agent has its own convex cost function and the goal is to minimize the sum of …
A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates
This paper proposes a novel proximal-gradient algorithm for a decentralized optimization
problem with a composite objective containing smooth and nonsmooth terms. Specifically …
problem with a composite objective containing smooth and nonsmooth terms. Specifically …
Accelerated distributed Nesterov gradient descent
This paper considers the distributed optimization problem over a network, where the
objective is to optimize a global function formed by a sum of local functions, using only local …
objective is to optimize a global function formed by a sum of local functions, using only local …
Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data
Decentralized training of deep learning models is a key element for enabling data privacy
and on-device learning over networks. In realistic learning scenarios, the presence of …
and on-device learning over networks. In realistic learning scenarios, the presence of …
A general framework for decentralized optimization with first-order methods
Decentralized optimization to minimize a finite sum of functions, distributed over a network of
nodes, has been a significant area within control and signal-processing research due to its …
nodes, has been a significant area within control and signal-processing research due to its …
Optimal algorithms for non-smooth distributed optimization in networks
In this work, we consider the distributed optimization of non-smooth convex functions using a
network of computing units. We investigate this problem under two regularity …
network of computing units. We investigate this problem under two regularity …