PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

[图书][B] First-order and stochastic optimization methods for machine learning

G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …

Acceleration for compressed gradient descent in distributed and federated optimization

Z Li, D Kovalev, X Qian, P Richtárik - arXiv preprint arXiv:2002.11364, 2020 - arxiv.org
Due to the high communication cost in distributed and federated learning problems,
methods relying on compression of communicated messages are becoming increasingly …

Convex optimization algorithms in medical image reconstruction—in the age of AI

J Xu, F Noo - Physics in Medicine & Biology, 2022 - iopscience.iop.org
The past decade has seen the rapid growth of model based image reconstruction (MBIR)
algorithms, which are often applications or adaptations of convex optimization algorithms …

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arXiv preprint arXiv …, 2022 - arxiv.org
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

Sharper rates for separable minimax and finite sum optimization via primal-dual extragradient methods

Y Jin, A Sidford, K Tian - Conference on Learning Theory, 2022 - proceedings.mlr.press
We design accelerated algorithms with improved rates for several fundamental classes of
optimization problems. Our algorithms all build upon techniques related to the analysis of …

EF21 with bells & whistles: Practical algorithmic extensions of modern error feedback

I Fatkhullin, I Sokolov, E Gorbunov, Z Li… - arXiv preprint arXiv …, 2021 - arxiv.org
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular
mechanism for enforcing convergence of distributed gradient-based optimization methods …

Adaptive stochastic variance reduction for non-convex finite-sum minimization

A Kavis, S Skoulakis… - Advances in …, 2022 - proceedings.neurips.cc
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $
L $-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider …

CANITA: Faster rates for distributed convex optimization with communication compression

Z Li, P Richtárik - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Due to the high communication cost in distributed and federated learning, methods relying
on compressed communication are becoming increasingly popular. Besides, the best …

FedPAGE: A fast local stochastic gradient method for communication-efficient federated learning

H Zhao, Z Li, P Richtárik - arXiv preprint arXiv:2108.04755, 2021 - arxiv.org
Federated Averaging (FedAvg, also known as Local-SGD)(McMahan et al., 2017) is a
classical federated learning algorithm in which clients run multiple local SGD steps before …