Linear convergence of adaptive stochastic gradient descent

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

被引用次数：102 相关文章所有 5 个版本

[PDF] neurips.cc

Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

被引用次数：22 相关文章所有 10 个版本

[PDF] arxiv.org

Adaptive gradient methods at the edge of stability

JM Cohen, B Ghorbani, S Krishnan, N Agarwal… - arXiv preprint arXiv …, 2022 - arxiv.org

Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …

被引用次数：52 相关文章所有 3 个版本

[PDF] arxiv.org

Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

被引用次数：78 相关文章所有 6 个版本

[PDF] neurips.cc

Dynamics of sgd with stochastic polyak stepsizes: Truly adaptive variants and convergence to exact solution

A Orvieto, S Lacoste-Julien… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Recently Loizou et al.(2021), proposed and analyzed stochastic gradient descent
(SGD) with stochastic Polyak stepsize (SPS). The proposed SPS comes with strong …

被引用次数：35 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] Deep learning regularization techniques to genomics data

H Soumare, A Benkahla, N Gmati - Array, 2021 - Elsevier

Deep Learning algorithms have achieved a great success in many domains where large
scale datasets are used. However, training these algorithms on high dimensional data …

被引用次数：18 相关文章所有 4 个版本

[PDF] neurips.cc

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …

被引用次数：26 相关文章所有 6 个版本

[PDF] sciencedirect.com

Sequential convergence of AdaGrad algorithm for smooth convex optimization

C Traoré, E Pauwels - Operations Research Letters, 2021 - Elsevier

We prove that the iterates produced by, either the scalar step size variant, or the
coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to …

被引用次数：35 相关文章所有 10 个版本

[PDF] mlr.press

Choosing the sample with lowest loss makes sgd robust

V Shah, X Wu, S Sanghavi - International Conference on …, 2020 - proceedings.mlr.press

The presence of outliers can potentially significantly skew the parameters of machine
learning models trained via stochastic gradient descent (SGD). In this paper we propose a …

被引用次数：51 相关文章所有 6 个版本

[PDF] mlr.press

Optimal algorithms for stochastic multi-level compositional optimization

W Jiang, B Wang, Y Wang, L Zhang… - … on Machine Learning, 2022 - proceedings.mlr.press

In this paper, we investigate the problem of stochastic multi-level compositional optimization,
where the objective function is a composition of multiple smooth but possibly non-convex …

被引用次数：18 相关文章所有 7 个版本