A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions
Gradient descent (GD) optimization algorithms are the standard ingredients that are used to
train artificial neural networks (ANNs). However, even in the case of the most basic variant of …
train artificial neural networks (ANNs). However, even in the case of the most basic variant of …
A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise …
Gradient descent (GD) type optimization methods are the standard instrument to train
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …
artificial neural networks (ANNs) with recti_ed linear unit (ReLU) activation. Despite the …
Blow up phenomena for gradient descent optimization methods in the training of artificial neural networks
In this article we investigate blow up phenomena for gradient descent optimization methods
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …
in the training of artificial neural networks (ANNs). Our theoretical analysis is focused on …
A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
In this article we study the stochastic gradient descent (SGD) optimization method in the
training of fully connected feedforward artificial neural networks with ReLU activation. The …
training of fully connected feedforward artificial neural networks with ReLU activation. The …
[HTML][HTML] Kinetic Langevin MCMC Sampling Without Gradient Lipschitz Continuity-the Strongly Convex Case
In this article we consider sampling from log concave distributions in Hamiltonian setting,
without assuming that the objective gradient is globally Lipschitz. We propose two …
without assuming that the objective gradient is globally Lipschitz. We propose two …
Non-asymptotic estimates for TUSLA algorithm for non-convex learning with applications to neural networks with ReLU activation function
We consider nonconvex stochastic optimization problems where the objective functions
have super-linearly growing and discontinuous stochastic gradients. In such a setting, we …
have super-linearly growing and discontinuous stochastic gradients. In such a setting, we …
Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions
In many numerical simulations stochastic gradient descent (SGD) type optimization methods
perform very effectively in the training of deep neural networks (DNNs) but till this day it …
perform very effectively in the training of deep neural networks (DNNs) but till this day it …
Non-asymptotic convergence bounds for modified tamed unadjusted Langevin algorithm in non-convex setting
We consider the problem of sampling from a high-dimensional target distribution π β on R d
with density proportional to θ↦ e− β U (θ) using explicit numerical schemes based on …
with density proportional to θ↦ e− β U (θ) using explicit numerical schemes based on …
Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems
In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm
tailored for solving a certain class of non-convex distributionally robust optimisation …
tailored for solving a certain class of non-convex distributionally robust optimisation …
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
DY Lim, S Sabanis - Journal of Machine Learning Research, 2024 - jmlr.org
We present a new class of Langevin-based algorithms, which overcomes many of the known
shortcomings of popular adaptive optimizers that are currently used for the fine tuning of …
shortcomings of popular adaptive optimizers that are currently used for the fine tuning of …