Biased stochastic conjugate gradient algorithm with adaptive step size for nonconvex problems
R Huang, Y Qin, K Liu, G Yuan - Expert Systems with Applications, 2024 - Elsevier
Conjugate gradient (CG) algorithms are widely applied to machine learning problems owing
to their low calculation cost compared with second-order methods and better convergence …
to their low calculation cost compared with second-order methods and better convergence …
Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Momentum-based variance reduction in non-convex sgd
A Cutkosky, F Orabona - Advances in neural information …, 2019 - proceedings.neurips.cc
Variance reduction has emerged in recent years as a strong competitor to stochastic
gradient descent in non-convex problems, providing the first algorithms to improve upon the …
gradient descent in non-convex problems, providing the first algorithms to improve upon the …
SARAH: A novel method for machine learning problems using stochastic recursive gradient
In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its
practical variant SARAH+, as a novel approach to the finite-sum minimization problems …
practical variant SARAH+, as a novel approach to the finite-sum minimization problems …
Breaking the centralized barrier for cross-device federated learning
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …
the data across different clients which gives rise to the client drift phenomenon. In fact …
Stochastic nested variance reduction for nonconvex optimization
We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …
average of n nonconvex functions or the expectation of some stochastic function. We …
Spiderboost and momentum: Faster variance reduction algorithms
SARAH and SPIDER are two recently developed stochastic variance-reduced algorithms,
and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in …
and SPIDER has been shown to achieve a near-optimal first-order oracle complexity in …
ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization
We propose a new stochastic first-order algorithmic framework to solve stochastic composite
nonconvex optimization problems that covers both finite-sum and expectation settings. Our …
nonconvex optimization problems that covers both finite-sum and expectation settings. Our …
Recent theoretical advances in non-convex optimization
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …
optimization in application to training deep neural networks and other optimization problems …
A multi-batch L-BFGS method for machine learning
AS Berahas, J Nocedal… - Advances in Neural …, 2016 - proceedings.neurips.cc
The question of how to parallelize the stochastic gradient descent (SGD) method has
received much attention in the literature. In this paper, we focus instead on batch methods …
received much attention in the literature. In this paper, we focus instead on batch methods …