Painless stochastic gradient: Interpolation, line-search, and convergence rates
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
Backtracking gradient descent method and some applications in large scale optimisation. Part 2: Algorithms and experiments
TT Truong, HT Nguyen - Applied Mathematics & Optimization, 2021 - Springer
In this paper, we provide new results and algorithms (including backtracking versions of
Nesterov accelerated gradient and Momentum) which are more applicable to large scale …
Nesterov accelerated gradient and Momentum) which are more applicable to large scale …
Why line search when you can plane search? so-friendly neural networks allow per-iteration optimization of learning and momentum rates for every layer
We introduce the class of SO-friendly neural networks, which include several models used in
practice including networks with 2 layers of hidden weights where the number of inputs is …
practice including networks with 2 layers of hidden weights where the number of inputs is …
Convergence to minima for the continuous version of backtracking gradient descent
TT Truong - arXiv preprint arXiv:1911.04221, 2019 - arxiv.org
The main result of this paper is:{\bf Theorem.} Let $ f:\mathbb {R}^ k\rightarrow\mathbb {R} $
be a $ C^{1} $ function, so that $\nabla f $ is locally Lipschitz continuous. Assume moreover …
be a $ C^{1} $ function, so that $\nabla f $ is locally Lipschitz continuous. Assume moreover …
Some convergent results for Backtracking Gradient Descent method on Banach spaces
TT Truong - arXiv preprint arXiv:2001.05768, 2020 - arxiv.org
Our main result concerns the following condition:{\bf Condition C.} Let $ X $ be a Banach
space. A $ C^ 1$ function $ f: X\rightarrow\mathbb {R} $ satisfies Condition C if whenever …
space. A $ C^ 1$ function $ f: X\rightarrow\mathbb {R} $ satisfies Condition C if whenever …
Fast Forwarding Low-Rank Training
Parameter efficient finetuning methods like low-rank adaptation (LoRA) aim to reduce the
computational costs of finetuning pretrained Language Models (LMs). Enabled by these low …
computational costs of finetuning pretrained Language Models (LMs). Enabled by these low …
Adaptive Backtracking For Faster Optimization
Backtracking line search is foundational in numerical optimization. The basic idea is to
adjust the step size of an algorithm by a constant factor until some chosen criterion (eg …
adjust the step size of an algorithm by a constant factor until some chosen criterion (eg …
Unconstrained optimisation on Riemannian manifolds
TT Truong - arXiv preprint arXiv:2008.11091, 2020 - arxiv.org
In this paper, we give explicit descriptions of versions of (Local-) Backtracking Gradient
Descent and New Q-Newton's method to the Riemannian setting. Here are some easy to …
Descent and New Q-Newton's method to the Riemannian setting. Here are some easy to …
Backtracking gradient descent allowing unbounded learning rates
TT Truong - arXiv preprint arXiv:2001.02005, 2020 - arxiv.org
In unconstrained optimisation on an Euclidean space, to prove convergence in Gradient
Descent processes (GD) $ x_ {n+ 1}= x_n-\delta _n\nabla f (x_n) $ it usually is required that …
Descent processes (GD) $ x_ {n+ 1}= x_n-\delta _n\nabla f (x_n) $ it usually is required that …
Inertial Newton algorithms avoiding strict saddle points
C Castera - Journal of Optimization Theory and Applications, 2023 - Springer
We study the asymptotic behavior of second-order algorithms mixing Newton's method and
inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian …
inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian …