Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

Fast convergence to non-isolated minima: four equivalent conditions for functions

Q Rebjock, N Boumal - Mathematical Programming, 2024 - Springer
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …

Mathematical introduction to deep learning: methods, implementations, and theory

A Jentzen, B Kuckuck, P von Wurstemberger - arXiv preprint arXiv …, 2023 - arxiv.org
This book aims to provide an introduction to the topic of deep learning algorithms. We review
essential components of deep learning algorithms in full mathematical detail including …

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

A Jentzen, A Riekert - arXiv preprint arXiv:2112.09684, 2021 - arxiv.org
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily
large number of hidden layers and we prove convergence of the risk of the GD optimization …

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …

A Jentzen, A Riekert - Journal of Machine Learning Research, 2022 - jmlr.org
Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …

Multilevel Picard approximations of high-dimensional semilinear partial differential equations with locally monotone coefficient functions

M Hutzenthaler, TA Nguyen - Applied Numerical Mathematics, 2022 - Elsevier
The full history recursive multilevel Picard approximation method for semilinear parabolic
partial differential equations (PDEs) is the only method which provably overcomes the curse …

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

S Eberle, A Jentzen, A Riekert, GS Weiss - arXiv preprint arXiv …, 2021 - arxiv.org
The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation
via gradient descent (GD) type optimization schemes is nowadays a common industrially …

Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non …

S Ibragimov, A Jentzen, A Riekert - arXiv preprint arXiv:2212.13111, 2022 - arxiv.org
Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong
nowadays to the most heavily employed computational schemes in the digital world. Despite …

Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

D Gallon, A Jentzen, F Lindner - arXiv preprint arXiv:2211.15641, 2022 - arxiv.org
In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

A Jentzen, A Riekert - Zeitschrift für angewandte Mathematik und Physik, 2022 - Springer
In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …