Taming neural networks with tusla: Nonconvex learning via adaptive stochastic gradient langevin...

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

P Cheridito, A Jentzen, A Riekert, F Rossmannek - Journal of Complexity, 2022 - Elsevier

Gradient descent (GD) optimization algorithms are the standard ingredients that are used to
train artificial neural networks (ANNs). However, even in the case of the most basic variant of …

被引用次数：30 相关文章所有 7 个版本

[PDF] jmlr.org

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …

A Jentzen, A Riekert - Journal of Machine Learning Research, 2022 - jmlr.org

Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks

D Gallon, A Jentzen, F Lindner - arXiv preprint arXiv:2211.15641, 2022 - arxiv.org

In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …

被引用次数：12 相关文章所有 3 个版本

[PDF] springer.com

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

A Jentzen, A Riekert - Zeitschrift für angewandte Mathematik und Physik, 2022 - Springer

In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …

被引用次数：24 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity-the Strongly Convex Case

T Johnston, I Lytras, S Sabanis - Journal of Complexity, 2024 - Elsevier

In this article we consider sampling from log concave distributions in Hamiltonian setting,
without assuming that the objective gradient is globally Lipschitz. We propose two …

被引用次数：8 相关文章所有 3 个版本

[PDF] arxiv.org

Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function

DY Lim, A Neufeld, S Sabanis… - IMA Journal of numerical …, 2024 - academic.oup.com

We consider nonconvex stochastic optimization problems where the objective functions
have super-linearly growing and discontinuous stochastic gradients. In such a setting, we …

被引用次数：17 相关文章所有 7 个版本

[PDF] arxiv.org

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

M Hutzenthaler, A Jentzen, K Pohl, A Riekert… - arXiv preprint arXiv …, 2021 - arxiv.org

In many numerical simulations stochastic gradient descent (SGD) type optimization methods
perform very effectively in the training of deep neural networks (DNNs) but till this day it …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Non-asymptotic convergence bounds for modified tamed unadjusted Langevin algorithm in non-convex setting

A Neufeld, Y Zhang - Journal of Mathematical Analysis and Applications, 2025 - Elsevier

We consider the problem of sampling from a high-dimensional target distribution π β on R d
with density proportional to θ↦ e− β U (θ) using explicit numerical schemes based on …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems

A Neufeld, MNC En, Y Zhang - arXiv preprint arXiv:2403.09532, 2024 - arxiv.org

In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm
tailored for solving a certain class of non-convex distributionally robust optimisation …

被引用次数：4 相关文章所有 6 个版本

[PDF] jmlr.org

Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks

DY Lim, S Sabanis - Journal of Machine Learning Research, 2024 - jmlr.org

We present a new class of Langevin-based algorithms, which overcomes many of the known
shortcomings of popular adaptive optimizers that are currently used for the fine tuning of …

被引用次数：12 相关文章所有 5 个版本