[HTML][HTML] Recent advances in stochastic gradient descent in deep learning
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …
tremendously motivating and hard problem. Among machine learning models, stochastic …
Variance-reduced methods for machine learning
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
Adabelief optimizer: Adapting stepsizes by the belief in observed gradients
Most popular optimizers for deep learning can be broadly categorized as adaptive methods
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …
Scaffold: Stochastic controlled averaging for federated learning
Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …
data remains distributed over a large number of clients and the task is to learn a centralized …
Large batch optimization for deep learning: Training bert in 76 minutes
Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …
challenging. There has been recent surge in interest in using large batch stochastic …
A survey of optimization methods from a machine learning perspective
Machine learning develops rapidly, which has made many theoretical breakthroughs and is
widely applied in various fields. Optimization, as an important part of machine learning, has …
widely applied in various fields. Optimization, as an important part of machine learning, has …
Decentralized federated averaging
Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training
with an enormous number of clients. In FedAvg, clients keep their data locally for privacy …
with an enormous number of clients. In FedAvg, clients keep their data locally for privacy …
Learning to reweight examples for robust deep learning
Deep neural networks have been shown to be very powerful modeling tools for many
supervised learning tasks involving complex input patterns. However, they can also easily …
supervised learning tasks involving complex input patterns. However, they can also easily …
Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Derivative-free optimization methods
In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …
applications, objective and constraint functions are available only as the output of a black …