Optimal algorithms for smooth and strongly convex distributed optimization in networks

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org

Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

被引用次数：151 相关文章所有 4 个版本

[PDF] mlr.press

A unified theory of decentralized sgd with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - International …, 2020 - proceedings.mlr.press

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

被引用次数：542 相关文章所有 9 个版本

[PDF] mlr.press

Decentralized stochastic optimization and gossip algorithms with compressed communication

A Koloskova, S Stich, M Jaggi - International Conference on …, 2019 - proceedings.mlr.press

We consider decentralized stochastic optimization with the objective function (eg data
samples for machine learning tasks) being distributed over n machines that can only …

被引用次数：569 相关文章所有 10 个版本

[PDF] arxiv.org

Fedpd: A federated learning framework with adaptivity to non-iid data

X Zhang, M Hong, S Dhople, W Yin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Federated Learning (FL) is popular for communication-efficient learning from distributed
data. To utilize data at different clients without moving them to the cloud, algorithms such as …

被引用次数：305 相关文章所有 6 个版本

[PDF] ieee.org

Push–pull gradient methods for distributed optimization in networks

S Pu, W Shi, J Xu, A Nedić - IEEE Transactions on Automatic …, 2020 - ieeexplore.ieee.org

In this article, we focus on solving a distributed convex optimization problem in a network,
where each agent has its own convex cost function and the goal is to minimize the sum of …

被引用次数：413 相关文章所有 14 个版本

[PDF] ieee.org

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

Z Li, W Shi, M Yan - IEEE Transactions on Signal Processing, 2019 - ieeexplore.ieee.org

This paper proposes a novel proximal-gradient algorithm for a decentralized optimization
problem with a composite objective containing smooth and nonsmooth terms. Specifically …

被引用次数：280 相关文章所有 7 个版本

[PDF] ieee.org

Accelerated distributed Nesterov gradient descent

G Qu, N Li - IEEE Transactions on Automatic Control, 2019 - ieeexplore.ieee.org

This paper considers the distributed optimization problem over a network, where the
objective is to optimize a global function formed by a sum of local functions, using only local …

被引用次数：260 相关文章所有 4 个版本

[PDF] arxiv.org

Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data

T Lin, SP Karimireddy, SU Stich, M Jaggi - arXiv preprint arXiv:2102.04761, 2021 - arxiv.org

Decentralized training of deep learning models is a key element for enabling data privacy
and on-device learning over networks. In realistic learning scenarios, the presence of …

被引用次数：106 相关文章所有 5 个版本

[PDF] ieee.org

A general framework for decentralized optimization with first-order methods

R Xin, S Pu, A Nedić, UA Khan - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org

Decentralized optimization to minimize a finite sum of functions, distributed over a network of
nodes, has been a significant area within control and signal-processing research due to its …

被引用次数：109 相关文章所有 6 个版本

[PDF] neurips.cc

Optimal algorithms for non-smooth distributed optimization in networks

K Scaman, F Bach, S Bubeck… - Advances in Neural …, 2018 - proceedings.neurips.cc

In this work, we consider the distributed optimization of non-smooth convex functions using a
network of computing units. We investigate this problem under two regularity …

被引用次数：191 相关文章所有 11 个版本