PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
[图书][B] First-order and stochastic optimization methods for machine learning
G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …
solution methods for many statistical and machine learning models rely on optimization. The …
Acceleration for compressed gradient descent in distributed and federated optimization
Due to the high communication cost in distributed and federated learning problems,
methods relying on compression of communicated messages are becoming increasingly …
methods relying on compression of communicated messages are becoming increasingly …
Convex optimization algorithms in medical image reconstruction—in the age of AI
The past decade has seen the rapid growth of model based image reconstruction (MBIR)
algorithms, which are often applications or adaptations of convex optimization algorithms …
algorithms, which are often applications or adaptations of convex optimization algorithms …
Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …
collaborative and federated learning. However, many fruitful directions, such as the usage of …
Sharper rates for separable minimax and finite sum optimization via primal-dual extragradient methods
We design accelerated algorithms with improved rates for several fundamental classes of
optimization problems. Our algorithms all build upon techniques related to the analysis of …
optimization problems. Our algorithms all build upon techniques related to the analysis of …
EF21 with bells & whistles: Practical algorithmic extensions of modern error feedback
First proposed by Seide (2014) as a heuristic, error feedback (EF) is a very popular
mechanism for enforcing convergence of distributed gradient-based optimization methods …
mechanism for enforcing convergence of distributed gradient-based optimization methods …
Adaptive stochastic variance reduction for non-convex finite-sum minimization
A Kavis, S Skoulakis… - Advances in …, 2022 - proceedings.neurips.cc
We propose an adaptive variance-reduction method, called AdaSpider, for minimization of $
L $-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider …
L $-smooth, non-convex functions with a finite-sum structure. In essence, AdaSpider …
CANITA: Faster rates for distributed convex optimization with communication compression
Z Li, P Richtárik - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Due to the high communication cost in distributed and federated learning, methods relying
on compressed communication are becoming increasingly popular. Besides, the best …
on compressed communication are becoming increasingly popular. Besides, the best …
FedPAGE: A fast local stochastic gradient method for communication-efficient federated learning
Federated Averaging (FedAvg, also known as Local-SGD)(McMahan et al., 2017) is a
classical federated learning algorithm in which clients run multiple local SGD steps before …
classical federated learning algorithm in which clients run multiple local SGD steps before …