Convergence of stochastic gradient descent schemes for Lojasiewicz-landscapes

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

被引用次数：66 相关文章所有 6 个版本

[PDF] springer.com

Fast convergence to non-isolated minima: four equivalent conditions for functions

Q Rebjock, N Boumal - Mathematical Programming, 2024 - Springer

Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …

被引用次数：7 相关文章

[PDF] arxiv.org

Mathematical introduction to deep learning: methods, implementations, and theory

A Jentzen, B Kuckuck, P von Wurstemberger - arXiv preprint arXiv …, 2023 - arxiv.org

This book aims to provide an introduction to the topic of deep learning algorithms. We review
essential components of deep learning algorithms in full mathematical detail including …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

A Jentzen, A Riekert - arXiv preprint arXiv:2112.09684, 2021 - arxiv.org

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily
large number of hidden layers and we prove convergence of the risk of the GD optimization …

被引用次数：22 相关文章所有 3 个版本

[PDF] jmlr.org

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …

A Jentzen, A Riekert - Journal of Machine Learning Research, 2022 - jmlr.org

Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …

被引用次数：19 相关文章所有 5 个版本

[PDF] arxiv.org

Multilevel Picard approximations of high-dimensional semilinear partial differential equations with locally monotone coefficient functions

M Hutzenthaler, TA Nguyen - Applied Numerical Mathematics, 2022 - Elsevier

The full history recursive multilevel Picard approximation method for semilinear parabolic
partial differential equations (PDEs) is the only method which provably overcomes the curse …

被引用次数：14 相关文章所有 4 个版本

[PDF] arxiv.org

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

S Eberle, A Jentzen, A Riekert, GS Weiss - arXiv preprint arXiv …, 2021 - arxiv.org

The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation
via gradient descent (GD) type optimization schemes is nowadays a common industrially …

被引用次数：21 相关文章所有 9 个版本

[PDF] arxiv.org

Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non …

S Ibragimov, A Jentzen, A Riekert - arXiv preprint arXiv:2212.13111, 2022 - arxiv.org

Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong
nowadays to the most heavily employed computational schemes in the digital world. Despite …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

D Gallon, A Jentzen, F Lindner - arXiv preprint arXiv:2211.15641, 2022 - arxiv.org

In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …

被引用次数：11 相关文章所有 3 个版本

[PDF] springer.com

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

A Jentzen, A Riekert - Zeitschrift für angewandte Mathematik und Physik, 2022 - Springer

In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …

被引用次数：23 相关文章所有 7 个版本