Stochastic variance reduction for nonconvex optimization

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

被引用次数：63 相关文章所有 5 个版本

[PDF] arxiv.org

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org

Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

被引用次数：118 相关文章所有 14 个版本

[PDF] neurips.cc

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

J Zhuang, T Tang, Y Ding… - Advances in neural …, 2020 - proceedings.neurips.cc

Most popular optimizers for deep learning can be broadly categorized as adaptive methods
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …

被引用次数：573 相关文章所有 7 个版本

[PDF] mlr.press

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press

Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

被引用次数：2498 相关文章所有 7 个版本

[PDF] berkeley.edu

Large batch optimization for deep learning: Training bert in 76 minutes

Y You, J Li, S Reddi, J Hseu, S Kumar… - arXiv preprint arXiv …, 2019 - arxiv.org

Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …

被引用次数：982 相关文章所有 9 个版本

[PDF] arxiv.org

A survey of optimization methods from a machine learning perspective

S Sun, Z Cao, H Zhu, J Zhao - IEEE transactions on cybernetics, 2019 - ieeexplore.ieee.org

Machine learning develops rapidly, which has made many theoretical breakthroughs and is
widely applied in various fields. Optimization, as an important part of machine learning, has …

被引用次数：770 相关文章所有 9 个版本

[PDF] arxiv.org

Decentralized federated averaging

T Sun, D Li, B Wang - IEEE Transactions on Pattern Analysis …, 2022 - ieeexplore.ieee.org

Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training
with an enormous number of clients. In FedAvg, clients keep their data locally for privacy …

被引用次数：181 相关文章所有 10 个版本

[PDF] mlr.press

Learning to reweight examples for robust deep learning

M Ren, W Zeng, B Yang… - … conference on machine …, 2018 - proceedings.mlr.press

Deep neural networks have been shown to be very powerful modeling tools for many
supervised learning tasks involving complex input patterns. However, they can also easily …

被引用次数：1614 相关文章所有 6 个版本

[PDF] neurips.cc

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator

C Fang, CJ Li, Z Lin, T Zhang - Advances in neural …, 2018 - proceedings.neurips.cc

In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …

被引用次数：614 相关文章所有 16 个版本

[PDF] arxiv.org

Derivative-free optimization methods

J Larson, M Menickelly, SM Wild - Acta Numerica, 2019 - cambridge.org

In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …

被引用次数：455 相关文章所有 9 个版本