A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

P Cheridito, A Jentzen, A Riekert, F Rossmannek - Journal of Complexity, 2022 - Elsevier
Gradient descent (GD) optimization algorithms are the standard ingredients that are used to
train artificial neural networks (ANNs). However, even in the case of the most basic variant of …

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …

A Jentzen, A Riekert - Journal of Machine Learning Research, 2022 - jmlr.org
Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …

Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

D Gallon, A Jentzen, F Lindner - arXiv preprint arXiv:2211.15641, 2022 - arxiv.org
In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

A Jentzen, A Riekert - Zeitschrift für angewandte Mathematik und Physik, 2022 - Springer
In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …

[HTML][HTML] Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity-the Strongly Convex Case

T Johnston, I Lytras, S Sabanis - Journal of Complexity, 2024 - Elsevier
In this article we consider sampling from log concave distributions in Hamiltonian setting,
without assuming that the objective gradient is globally Lipschitz. We propose two …

Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

DY Lim, A Neufeld, S Sabanis… - IMA Journal of numerical …, 2024 - academic.oup.com
We consider nonconvex stochastic optimization problems where the objective functions
have super-linearly growing and discontinuous stochastic gradients. In such a setting, we …

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

M Hutzenthaler, A Jentzen, K Pohl, A Riekert… - arXiv preprint arXiv …, 2021 - arxiv.org
In many numerical simulations stochastic gradient descent (SGD) type optimization methods
perform very effectively in the training of deep neural networks (DNNs) but till this day it …

Non-asymptotic convergence bounds for modified tamed unadjusted Langevin algorithm in non-convex setting

A Neufeld, Y Zhang - Journal of Mathematical Analysis and Applications, 2025 - Elsevier
We consider the problem of sampling from a high-dimensional target distribution π β on R d
with density proportional to θ↦ e− β U (θ) using explicit numerical schemes based on …

Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems

A Neufeld, MNC En, Y Zhang - arXiv preprint arXiv:2403.09532, 2024 - arxiv.org
In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm
tailored for solving a certain class of non-convex distributionally robust optimisation …

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

DY Lim, S Sabanis - Journal of Machine Learning Research, 2024 - jmlr.org
We present a new class of Langevin-based algorithms, which overcomes many of the known
shortcomings of popular adaptive optimizers that are currently used for the fine tuning of …