Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
On efficient training of large-scale deep learning models: A literature review
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …
(CV), natural language processing (NLP), and speech. The use of large-scale models …
Optimization for deep learning: theory and algorithms
R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …
overview of optimization algorithms and theory for training neural networks. First, we discuss …
A multi-batch L-BFGS method for machine learning
AS Berahas, J Nocedal… - Advances in Neural …, 2016 - proceedings.neurips.cc
The question of how to parallelize the stochastic gradient descent (SGD) method has
received much attention in the literature. In this paper, we focus instead on batch methods …
received much attention in the literature. In this paper, we focus instead on batch methods …
SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models
In recent years, implicit deep learning has emerged as a method to increase the effective
depth of deep neural networks. While their training is memory-efficient, they are still …
depth of deep neural networks. While their training is memory-efficient, they are still …
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …
smooth optimization. It guarantees similar performance to using the theoretically optimal …
Fast and furious convergence: Stochastic second order methods under interpolation
We consider stochastic second-order methods for minimizing smooth and strongly-convex
functions under an interpolation condition satisfied by over-parameterized models. Under …
functions under an interpolation condition satisfied by over-parameterized models. Under …
On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient
V Krutikov, E Tovbis, P Stanimirović, L Kazakovtsev - Mathematics, 2023 - mdpi.com
The main results of the study of the convergence rate of quasi-Newton minimization methods
were obtained under the assumption that the method operates in the region of the extremum …
were obtained under the assumption that the method operates in the region of the extremum …
Towards explicit superlinear convergence rate for SR1
We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm, which has
wide applications in different scenarios. Although it has been extensively investigated, SR1 …
wide applications in different scenarios. Although it has been extensively investigated, SR1 …
Doubly adaptive scaled algorithm for machine learning using second-order information
We present a novel adaptive optimization algorithm for large-scale machine learning
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …