Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Optimization for deep learning: theory and algorithms

R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

A multi-batch L-BFGS method for machine learning

AS Berahas, J Nocedal… - Advances in Neural …, 2016 - proceedings.neurips.cc
The question of how to parallelize the stochastic gradient descent (SGD) method has
received much attention in the literature. In this paper, we focus instead on batch methods …

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Z Ramzi, F Mannel, S Bai, JL Starck, P Ciuciu… - arXiv preprint arXiv …, 2021 - arxiv.org
In recent years, implicit deep learning has emerged as a method to increase the effective
depth of deep neural networks. While their training is memory-efficient, they are still …

Searching for optimal per-coordinate step-sizes with multidimensional backtracking

F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …

Fast and furious convergence: Stochastic second order methods under interpolation

SY Meng, S Vaswani, IH Laradji… - International …, 2020 - proceedings.mlr.press
We consider stochastic second-order methods for minimizing smooth and strongly-convex
functions under an interpolation condition satisfied by over-parameterized models. Under …

On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient

V Krutikov, E Tovbis, P Stanimirović, L Kazakovtsev - Mathematics, 2023 - mdpi.com
The main results of the study of the convergence rate of quasi-Newton minimization methods
were obtained under the assumption that the method operates in the region of the extremum …

Towards explicit superlinear convergence rate for SR1

H Ye, D Lin, X Chang, Z Zhang - Mathematical Programming, 2023 - Springer
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm, which has
wide applications in different scenarios. Although it has been extensively investigated, SR1 …

Doubly adaptive scaled algorithm for machine learning using second-order information

M Jahani, S Rusakov, Z Shi, P Richtárik… - arXiv preprint arXiv …, 2021 - arxiv.org
We present a novel adaptive optimization algorithm for large-scale machine learning
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …