Robustness to unbounded smoothness of generalized signsgd
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
High-dimensional limit theorems for sgd: Effective dynamics and critical scaling
G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
Private stochastic convex optimization: optimal rates in linear time
We study differentially private (DP) algorithms for stochastic convex optimization: the
problem of minimizing the population loss given iid samples from a distribution over convex …
problem of minimizing the population loss given iid samples from a distribution over convex …
Online distributed algorithms for online noncooperative games with stochastic cost functions: high probability bound of regrets
K Lu - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
In this article, online noncooperative games without full decision information are studied,
where the goal of players is to seek the Nash equilibria in a distributed manner. Different …
where the goal of players is to seek the Nash equilibria in a distributed manner. Different …
High probability convergence of stochastic gradient methods
In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …
On the convergence of adaptive gradient methods for nonconvex optimization
Adaptive gradient methods are workhorses in deep learning. However, the convergence
guarantees of adaptive gradient methods for nonconvex optimization have not been …
guarantees of adaptive gradient methods for nonconvex optimization have not been …
The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares
Minimax optimal convergence rates for numerous classes of stochastic convex optimization
problems are well characterized, where the majority of results utilize iterate averaged …
problems are well characterized, where the majority of results utilize iterate averaged …
High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
V Feldman, J Vondrak - Conference on Learning Theory, 2019 - proceedings.mlr.press
Algorithmic stability is a classical approach to understanding and analysis of the
generalization error of learning algorithms. A notable weakness of most stability-based …
generalization error of learning algorithms. A notable weakness of most stability-based …
Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball
We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB,
otherwise known as the momentum method) for the general stochastic approximation …
otherwise known as the momentum method) for the general stochastic approximation …
Improved convergence in high probability of clipped gradient methods with heavy tailed noise
Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …