Quasi-Newton methods for machine learning: forget the past, just sample

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

被引用次数：168 相关文章所有 7 个版本

[PDF] arxiv.org

On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

被引用次数：37 相关文章所有 2 个版本

[PDF] arxiv.org

Optimization for deep learning: theory and algorithms

R Sun - arXiv preprint arXiv:1912.08957, 2019 - arxiv.org

When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

被引用次数：246 相关文章所有 4 个版本

[PDF] neurips.cc

A multi-batch L-BFGS method for machine learning

AS Berahas, J Nocedal… - Advances in Neural …, 2016 - proceedings.neurips.cc

The question of how to parallelize the stochastic gradient descent (SGD) method has
received much attention in the literature. In this paper, we focus instead on batch methods …

被引用次数：175 相关文章所有 18 个版本

[PDF] arxiv.org

SHINE: SHaring the INverse Estimate from the forward pass for bi-level optimization and implicit models

Z Ramzi, F Mannel, S Bai, JL Starck, P Ciuciu… - arXiv preprint arXiv …, 2021 - arxiv.org

In recent years, implicit deep learning has emerged as a method to increase the effective
depth of deep neural networks. While their training is memory-efficient, they are still …

被引用次数：31 相关文章所有 17 个版本

[PDF] neurips.cc

Searching for optimal per-coordinate step-sizes with multidimensional backtracking

F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc

The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …

被引用次数：6 相关文章所有 6 个版本

[PDF] mlr.press

Fast and furious convergence: Stochastic second order methods under interpolation

SY Meng, S Vaswani, IH Laradji… - International …, 2020 - proceedings.mlr.press

We consider stochastic second-order methods for minimizing smooth and strongly-convex
functions under an interpolation condition satisfied by over-parameterized models. Under …

被引用次数：39 相关文章所有 4 个版本

[PDF] mdpi.com

On the Convergence Rate of Quasi-Newton Methods on Strongly Convex Functions with Lipschitz Gradient

V Krutikov, E Tovbis, P Stanimirović, L Kazakovtsev - Mathematics, 2023 - mdpi.com

The main results of the study of the convergence rate of quasi-Newton minimization methods
were obtained under the assumption that the method operates in the region of the extremum …

被引用次数：4 相关文章所有 7 个版本

Towards explicit superlinear convergence rate for SR1

H Ye, D Lin, X Chang, Z Zhang - Mathematical Programming, 2023 - Springer

We study the convergence rate of the famous Symmetric Rank-1 (SR1) algorithm, which has
wide applications in different scenarios. Although it has been extensively investigated, SR1 …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Doubly adaptive scaled algorithm for machine learning using second-order information

M Jahani, S Rusakov, Z Shi, P Richtárik… - arXiv preprint arXiv …, 2021 - arxiv.org

We present a novel adaptive optimization algorithm for large-scale machine learning
problems. Equipped with a low-cost estimate of local curvature and Lipschitz smoothness …

被引用次数：20 相关文章所有 8 个版本