Scheduled restart momentum for accelerated stochastic gradient descent

F Mehmood, S Ahmad, TK Whangbo - Mathematics, 2023 - mdpi.com

Deep learning is a sub-branch of artificial intelligence that acquires knowledge by training a
neural network. It has many applications in the field of banking, automobile industry …

被引用次数：61 相关文章所有 6 个版本

[PDF] neurips.cc

Heavy ball neural ordinary differential equations

H Xia, V Suliafu, H Ji, T Nguyen… - Advances in …, 2021 - proceedings.neurips.cc

We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the
continuous limit of the classical momentum accelerated gradient descent, to improve neural …

被引用次数：64 相关文章所有 10 个版本

[PDF] arxiv.org

Adasam: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks

H Sun, L Shen, Q Zhong, L Ding, S Chen, J Sun, J Li… - Neural Networks, 2024 - Elsevier

Sharpness aware minimization (SAM) optimizer has been extensively explored as it can
generalize better for training deep neural networks via introducing extra perturbation steps …

被引用次数：33 相关文章所有 5 个版本

[PDF] aaai.org

Stability and generalization of decentralized stochastic gradient descent

T Sun, D Li, B Wang - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

The stability and generalization of stochastic gradient-based methods provide valuable
insights into understanding the algorithmic performance of machine learning models. As the …

被引用次数：39 相关文章所有 8 个版本

[PDF] arxiv.org

Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems

Z Wang, J Zhang, TH Chang, J Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

While many distributed optimization algorithms have been proposed for solving smooth or
convex problems over the networks, few of them can handle non-convex and non-smooth …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

Accelerated gradient descent learning over multiple access fading channels

R Paul, Y Friedman, K Cohen - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org

We consider a distributed learning problem in a wireless network, consisting of distributed
edge devices and a parameter server (PS). The objective function is a sum of the edge …

被引用次数：28 相关文章所有 5 个版本

[PDF] mlr.press

SMG: A shuffling gradient-based method with momentum

TH Tran, LM Nguyen… - … Conference on Machine …, 2021 - proceedings.mlr.press

We combine two advanced ideas widely used in optimization for machine learning:\textit
{shuffling} strategy and\textit {momentum} technique to develop a novel shuffling gradient …

被引用次数：32 相关文章所有 11 个版本

[PDF] arxiv.org

Training deep neural networks with adaptive momentum inspired by the quadratic optimization

T Sun, H Ling, Z Shi, D Li, B Wang - arXiv preprint arXiv:2110.09057, 2021 - arxiv.org

Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization
algorithms for machine learning. Existing heavy ball momentum is usually weighted by a …

被引用次数：15 相关文章所有 3 个版本

[PDF] neurips.cc

Momentumrnn: Integrating momentum into recurrent neural networks

T Nguyen, R Baraniuk, A Bertozzi… - Advances in neural …, 2020 - proceedings.neurips.cc

Designing deep neural networks is an art that often involves an expensive search over
candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a …

被引用次数：31 相关文章所有 12 个版本

[PDF] neurips.cc

Improving neural ordinary differential equations with nesterov's accelerated gradient method

HHN Nguyen, T Nguyen, H Vo… - Advances in Neural …, 2022 - proceedings.neurips.cc

We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose
layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's …

被引用次数：10 相关文章所有 6 个版本