An efficient optimization technique for training deep neural networks

F Mehmood, S Ahmad, TK Whangbo - Mathematics, 2023 - mdpi.com
Deep learning is a sub-branch of artificial intelligence that acquires knowledge by training a
neural network. It has many applications in the field of banking, automobile industry …

Heavy ball neural ordinary differential equations

H Xia, V Suliafu, H Ji, T Nguyen… - Advances in …, 2021 - proceedings.neurips.cc
We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the
continuous limit of the classical momentum accelerated gradient descent, to improve neural …

Adasam: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks

H Sun, L Shen, Q Zhong, L Ding, S Chen, J Sun, J Li… - Neural Networks, 2024 - Elsevier
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can
generalize better for training deep neural networks via introducing extra perturbation steps …

Stability and generalization of decentralized stochastic gradient descent

T Sun, D Li, B Wang - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
The stability and generalization of stochastic gradient-based methods provide valuable
insights into understanding the algorithmic performance of machine learning models. As the …

Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems

Z Wang, J Zhang, TH Chang, J Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
While many distributed optimization algorithms have been proposed for solving smooth or
convex problems over the networks, few of them can handle non-convex and non-smooth …

Accelerated gradient descent learning over multiple access fading channels

R Paul, Y Friedman, K Cohen - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org
We consider a distributed learning problem in a wireless network, consisting of distributed
edge devices and a parameter server (PS). The objective function is a sum of the edge …

SMG: A shuffling gradient-based method with momentum

TH Tran, LM Nguyen… - … Conference on Machine …, 2021 - proceedings.mlr.press
We combine two advanced ideas widely used in optimization for machine learning:\textit
{shuffling} strategy and\textit {momentum} technique to develop a novel shuffling gradient …

Training deep neural networks with adaptive momentum inspired by the quadratic optimization

T Sun, H Ling, Z Shi, D Li, B Wang - arXiv preprint arXiv:2110.09057, 2021 - arxiv.org
Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization
algorithms for machine learning. Existing heavy ball momentum is usually weighted by a …

Momentumrnn: Integrating momentum into recurrent neural networks

T Nguyen, R Baraniuk, A Bertozzi… - Advances in neural …, 2020 - proceedings.neurips.cc
Designing deep neural networks is an art that often involves an expensive search over
candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a …

Improving neural ordinary differential equations with nesterov's accelerated gradient method

HHN Nguyen, T Nguyen, H Vo… - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose
layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's …