Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis
S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …
learning. The noise encountered in these applications is different from that in many …
Fast convergence to non-isolated minima: four equivalent conditions for functions
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …
at the optimum is singular. These singularities are inescapable when the optima are non …
Mathematical introduction to deep learning: methods, implementations, and theory
This book aims to provide an introduction to the topic of deep learning algorithms. We review
essential components of deep learning algorithms in full mathematical detail including …
essential components of deep learning algorithms in full mathematical detail including …
On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily
large number of hidden layers and we prove convergence of the risk of the GD optimization …
large number of hidden layers and we prove convergence of the risk of the GD optimization …
A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …
Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …
Multilevel Picard approximations of high-dimensional semilinear partial differential equations with locally monotone coefficient functions
M Hutzenthaler, TA Nguyen - Applied Numerical Mathematics, 2022 - Elsevier
The full history recursive multilevel Picard approximation method for semilinear parabolic
partial differential equations (PDEs) is the only method which provably overcomes the curse …
partial differential equations (PDEs) is the only method which provably overcomes the curse …
Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation
The training of artificial neural networks (ANNs) with rectified linear unit (ReLU) activation
via gradient descent (GD) type optimization schemes is nowadays a common industrially …
via gradient descent (GD) type optimization schemes is nowadays a common industrially …
Convergence to good non-optimal critical points in the training of neural networks: Gradient descent optimization with one random initialization overcomes all bad non …
Gradient descent (GD) methods for the training of artificial neural networks (ANNs) belong
nowadays to the most heavily employed computational schemes in the digital world. Despite …
nowadays to the most heavily employed computational schemes in the digital world. Despite …
Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks
In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …
training of fully connected feedforward artificial neural networks with ReLU activation. The …