Tight analyses for non-smooth stochastic gradient descent

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc

Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

被引用次数：65 相关文章所有 9 个版本

[PDF] neurips.cc

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

被引用次数：71 相关文章所有 12 个版本

[PDF] acm.org

Private stochastic convex optimization: optimal rates in linear time

V Feldman, T Koren, K Talwar - Proceedings of the 52nd Annual ACM …, 2020 - dl.acm.org

We study differentially private (DP) algorithms for stochastic convex optimization: the
problem of minimizing the population loss given iid samples from a distribution over convex …

被引用次数：201 相关文章所有 4 个版本

Online distributed algorithms for online noncooperative games with stochastic cost functions: high probability bound of regrets

K Lu - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org

In this article, online noncooperative games without full decision information are studied,
where the goal of players is to seek the Nash equilibria in a distributed manner. Different …

被引用次数：24 相关文章

[PDF] mlr.press

High probability convergence of stochastic gradient methods

Z Liu, TD Nguyen, TH Nguyen… - … on Machine Learning, 2023 - proceedings.mlr.press

In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …

被引用次数：39 相关文章所有 8 个版本

[PDF] arxiv.org

On the convergence of adaptive gradient methods for nonconvex optimization

D Zhou, J Chen, Y Cao, Z Yang, Q Gu - arXiv preprint arXiv:1808.05671, 2018 - arxiv.org

Adaptive gradient methods are workhorses in deep learning. However, the convergence
guarantees of adaptive gradient methods for nonconvex optimization have not been …

被引用次数：211 相关文章所有 7 个版本

[PDF] neurips.cc

The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares

R Ge, SM Kakade, R Kidambi… - Advances in neural …, 2019 - proceedings.neurips.cc

Minimax optimal convergence rates for numerous classes of stochastic convex optimization
problems are well characterized, where the majority of results utilize iterate averaged …

被引用次数：186 相关文章所有 12 个版本

[PDF] mlr.press

High probability generalization bounds for uniformly stable algorithms with nearly optimal rate

V Feldman, J Vondrak - Conference on Learning Theory, 2019 - proceedings.mlr.press

Algorithmic stability is a classical approach to understanding and analysis of the
generalization error of learning algorithms. A notable weakness of most stability-based …

被引用次数：171 相关文章所有 8 个版本

[PDF] mlr.press

Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball

O Sebbouh, RM Gower… - Conference on Learning …, 2021 - proceedings.mlr.press

We study stochastic gradient descent (SGD) and the stochastic heavy ball method (SHB,
otherwise known as the momentum method) for the general stochastic approximation …

被引用次数：120 相关文章所有 8 个版本

[PDF] neurips.cc

Improved convergence in high probability of clipped gradient methods with heavy tailed noise

TD Nguyen, TH Nguyen, A Ene… - Advances in Neural …, 2023 - proceedings.neurips.cc

Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …

被引用次数：12 相关文章所有 5 个版本