Newton-type methods for non-convex optimization under inexact Hessian information
We consider variants of trust-region and adaptive cubic regularization methods for non-
convex optimization, in which the Hessian matrix is approximated. Under certain condition …
convex optimization, in which the Hessian matrix is approximated. Under certain condition …
[PDF][PDF] Distributed Second-Order Optimization using Kronecker-Factored Approximations.
As more computational resources become available, machine learning researchers train
ever larger neural networks on millions of data points using stochastic gradient descent …
ever larger neural networks on millions of data points using stochastic gradient descent …
Inexact non-convex Newton-type methods
For solving large-scale non-convex problems, we propose inexact variants of trust region
and adaptive cubic regularization methods, which, to increase efficiency, incorporate various …
and adaptive cubic regularization methods, which, to increase efficiency, incorporate various …
Inexact nonconvex newton-type methods
For solving large-scale nonconvex problems, we propose inexact variants of trust region and
adaptive cubic regularization methods, which, to increase efficiency, incorporate various …
adaptive cubic regularization methods, which, to increase efficiency, incorporate various …
Distributed newton methods for deep neural networks
Deep learning involves a difficult nonconvex optimization problem with a large number of
weights between any two adjacent layers of a deep structure. To handle large data sets or …
weights between any two adjacent layers of a deep structure. To handle large data sets or …
Block-diagonal hessian-free optimization for training neural networks
Second-order methods for neural network optimization have several advantages over
methods based on first-order gradient descent, including better scaling to large mini-batch …
methods based on first-order gradient descent, including better scaling to large mini-batch …
Newton methods for convolutional neural networks
Deep learning involves a difficult non-convex optimization problem, which is often solved by
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …
High-order automatic differentiation of unmodified linear algebra routines via nilpotent matrices
BZ Dunham - 2017 - search.proquest.com
This work presents a new automatic differentiation method, Nilpotent Matrix Differentiation
(NMD), capable of propagating any order of mixed or univariate derivative through common …
(NMD), capable of propagating any order of mixed or univariate derivative through common …
[图书][B] Efficient Second-Order Methods for Non-Convex Optimization and Machine Learning
Z Yao - 2021 - search.proquest.com
Hessian-based analysis/computation is widely used in scientific computing. However, due to
the (incorrect, but in our experience widespread) belief that Hessian-based computations …
the (incorrect, but in our experience widespread) belief that Hessian-based computations …
Newton methods for convolutional neural networks
Deep learning involves a difficult non-convex optimization problem, which is often solved by
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …
stochastic gradient (SG) methods. While SG is usually effective, it may not be robust in some …