[HTML][HTML] Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

A survey of optimization methods from a machine learning perspective

S Sun, Z Cao, H Zhu, J Zhao - IEEE transactions on cybernetics, 2019 - ieeexplore.ieee.org
Machine learning develops rapidly, which has made many theoretical breakthroughs and is
widely applied in various fields. Optimization, as an important part of machine learning, has …

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arXiv preprint arXiv …, 2016 - arxiv.org
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

Y Liu, K Zhang, T Basar, W Yin - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …

Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods

N Loizou, P Richtárik - Computational Optimization and Applications, 2020 - Springer
In this paper we study several classes of stochastic optimization algorithms enriched with
heavy ball momentum. Among the methods studied are: stochastic gradient descent …

Don't jump through hoops and remove those loops: SVRG and Katyusha are better without the outer loop

D Kovalev, S Horváth… - Algorithmic Learning …, 2020 - proceedings.mlr.press
The stochastic variance-reduced gradient method (SVRG) and its accelerated variant
(Katyusha) have attracted enormous attention in the machine learning community in the last …

Distributed optimization with arbitrary local solvers

C Ma, J Konečný, M Jaggi, V Smith… - optimization Methods …, 2017 - Taylor & Francis
With the growth of data and necessity for distributed optimization methods, solvers that work
well on a single machine must be re-designed to leverage distributed computation. Recent …

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - … Conference on Learning …, 2023 - par.nsf.gov
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

Stochastic quasi-Newton methods for nonconvex stochastic optimization

X Wang, S Ma, D Goldfarb, W Liu - SIAM Journal on Optimization, 2017 - SIAM
In this paper we study stochastic quasi-Newton methods for nonconvex stochastic
optimization, where we assume that noisy information about the gradients of the objective …