Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

Adaptive gradient methods at the edge of stability

JM Cohen, B Ghorbani, S Krishnan, N Agarwal… - arXiv preprint arXiv …, 2022 - arxiv.org
Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …

Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

Dynamics of sgd with stochastic polyak stepsizes: Truly adaptive variants and convergence to exact solution

A Orvieto, S Lacoste-Julien… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Recently Loizou et al.(2021), proposed and analyzed stochastic gradient descent
(SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong …

[HTML][HTML] Deep learning regularization techniques to genomics data

H Soumare, A Benkahla, N Gmati - Array, 2021 - Elsevier
Deep Learning algorithms have achieved a great success in many domains where large
scale datasets are used. However, training these algorithms on high dimensional data …

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …

Sequential convergence of AdaGrad algorithm for smooth convex optimization

C Traoré, E Pauwels - Operations Research Letters, 2021 - Elsevier
We prove that the iterates produced by, either the scalar step size variant, or the
coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to …

Choosing the sample with lowest loss makes sgd robust

V Shah, X Wu, S Sanghavi - International Conference on …, 2020 - proceedings.mlr.press
The presence of outliers can potentially significantly skew the parameters of machine
learning models trained via stochastic gradient descent (SGD). In this paper we propose a …

Optimal algorithms for stochastic multi-level compositional optimization

W Jiang, B Wang, Y Wang, L Zhang… - … on Machine Learning, 2022 - proceedings.mlr.press
In this paper, we investigate the problem of stochastic multi-level compositional optimization,
where the objective function is a composition of multiple smooth but possibly non-convex …