Newton-type methods for non-convex optimization under inexact Hessian information

P Xu, F Roosta, MW Mahoney - Mathematical Programming, 2020 - Springer
We consider variants of trust-region and adaptive cubic regularization methods for non-
convex optimization, in which the Hessian matrix is approximated. Under certain condition …

[PDF][PDF] Distributed Second-Order Optimization using Kronecker-Factored Approximations.

J Ba, RB Grosse, J Martens - ICLR (Poster), 2017 - jimmylba.github.io
As more computational resources become available, machine learning researchers train
ever larger neural networks on millions of data points using stochastic gradient descent …

Inexact non-convex Newton-type methods

Z Yao, P Xu, F Roosta-Khorasani… - arXiv preprint arXiv …, 2018 - arxiv.org
For solving large-scale non-convex problems, we propose inexact variants of trust region
and adaptive cubic regularization methods, which, to increase efficiency, incorporate various …

Inexact nonconvex newton-type methods

Z Yao, P Xu, F Roosta… - INFORMS Journal on …, 2021 - pubsonline.informs.org
For solving large-scale nonconvex problems, we propose inexact variants of trust region and
adaptive cubic regularization methods, which, to increase efficiency, incorporate various …

Distributed newton methods for deep neural networks

CC Wang, KL Tan, CT Chen, YH Lin… - Neural …, 2018 - ieeexplore.ieee.org
Deep learning involves a difficult nonconvex optimization problem with a large number of
weights between any two adjacent layers of a deep structure. To handle large data sets or …

Block-diagonal hessian-free optimization for training neural networks

H Zhang, C Xiong, J Bradbury, R Socher - arXiv preprint arXiv:1712.07296, 2017 - arxiv.org
Second-order methods for neural network optimization have several advantages over
methods based on first-order gradient descent, including better scaling to large mini-batch …

Newton methods for convolutional neural networks

CC Wang, KL Tan, CJ Lin - … on Intelligent Systems and Technology (TIST …, 2020 - dl.acm.org
Deep learning involves a difficult non-convex optimization problem, which is often solved by
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …

High-order automatic differentiation of unmodified linear algebra routines via nilpotent matrices

BZ Dunham - 2017 - search.proquest.com
This work presents a new automatic differentiation method, Nilpotent Matrix Differentiation
(NMD), capable of propagating any order of mixed or univariate derivative through common …

[图书][B] Efficient Second-Order Methods for Non-Convex Optimization and Machine Learning

Z Yao - 2021 - search.proquest.com
Hessian-based analysis/computation is widely used in scientific computing. However, due to
the (incorrect, but in our experience widespread) belief that Hessian-based computations …

Newton methods for convolutional neural networks

CC Wang, KL Tan, CJ Lin - arXiv preprint arXiv:1811.06100, 2018 - arxiv.org
Deep learning involves a difficult non-convex optimization problem, which is often solved by
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …