Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Private stochastic convex optimization: optimal rates in linear time

V Feldman, T Koren, K Talwar - Proceedings of the 52nd Annual ACM …, 2020 - dl.acm.org
We study differentially private (DP) algorithms for stochastic convex optimization: the
problem of minimizing the population loss given iid samples from a distribution over convex …

Online distributed algorithms for online noncooperative games with stochastic cost functions: high probability bound of regrets

K Lu - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
In this article, online noncooperative games without full decision information are studied,
where the goal of players is to seek the Nash equilibria in a distributed manner. Different …

High probability convergence of stochastic gradient methods

Z Liu, TD Nguyen, TH Nguyen… - … on Machine Learning, 2023 - proceedings.mlr.press
In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …

On the convergence of adaptive gradient methods for nonconvex optimization

D Zhou, J Chen, Y Cao, Z Yang, Q Gu - arXiv preprint arXiv:1808.05671, 2018 - arxiv.org
Adaptive gradient methods are workhorses in deep learning. However, the convergence
guarantees of adaptive gradient methods for nonconvex optimization have not been …

The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares

R Ge, SM Kakade, R Kidambi… - Advances in neural …, 2019 - proceedings.neurips.cc
Minimax optimal convergence rates for numerous classes of stochastic convex optimization
problems are well characterized, where the majority of results utilize iterate averaged …

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

V Feldman, J Vondrak - Conference on Learning Theory, 2019 - proceedings.mlr.press
Algorithmic stability is a classical approach to understanding and analysis of the
generalization error of learning algorithms. A notable weakness of most stability-based …

Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball

O Sebbouh, RM Gower… - Conference on Learning …, 2021 - proceedings.mlr.press
We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB,
otherwise known as the momentum method) for the general stochastic approximation …

Improved convergence in high probability of clipped gradient methods with heavy tailed noise

TD Nguyen, TH Nguyen, A Ene… - Advances in Neural …, 2023 - proceedings.neurips.cc
Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …