An efficient optimization technique for training deep neural networks
Deep learning is a sub-branch of artificial intelligence that acquires knowledge by training a
neural network. It has many applications in the field of banking, automobile industry …
neural network. It has many applications in the field of banking, automobile industry …
Heavy ball neural ordinary differential equations
We propose heavy ball neural ordinary differential equations (HBNODEs), leveraging the
continuous limit of the classical momentum accelerated gradient descent, to improve neural …
continuous limit of the classical momentum accelerated gradient descent, to improve neural …
Adasam: Boosting sharpness-aware minimization with adaptive learning rate and momentum for training deep neural networks
Sharpness aware minimization (SAM) optimizer has been extensively explored as it can
generalize better for training deep neural networks via introducing extra perturbation steps …
generalize better for training deep neural networks via introducing extra perturbation steps …
Stability and generalization of decentralized stochastic gradient descent
The stability and generalization of stochastic gradient-based methods provide valuable
insights into understanding the algorithmic performance of machine learning models. As the …
insights into understanding the algorithmic performance of machine learning models. As the …
Distributed stochastic consensus optimization with momentum for nonconvex nonsmooth problems
While many distributed optimization algorithms have been proposed for solving smooth or
convex problems over the networks, few of them can handle non-convex and non-smooth …
convex problems over the networks, few of them can handle non-convex and non-smooth …
Accelerated gradient descent learning over multiple access fading channels
R Paul, Y Friedman, K Cohen - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org
We consider a distributed learning problem in a wireless network, consisting of distributed
edge devices and a parameter server (PS). The objective function is a sum of the edge …
edge devices and a parameter server (PS). The objective function is a sum of the edge …
SMG: A shuffling gradient-based method with momentum
We combine two advanced ideas widely used in optimization for machine learning:\textit
{shuffling} strategy and\textit {momentum} technique to develop a novel shuffling gradient …
{shuffling} strategy and\textit {momentum} technique to develop a novel shuffling gradient …
Training deep neural networks with adaptive momentum inspired by the quadratic optimization
Heavy ball momentum is crucial in accelerating (stochastic) gradient-based optimization
algorithms for machine learning. Existing heavy ball momentum is usually weighted by a …
algorithms for machine learning. Existing heavy ball momentum is usually weighted by a …
Momentumrnn: Integrating momentum into recurrent neural networks
Designing deep neural networks is an art that often involves an expensive search over
candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a …
candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a …
Improving neural ordinary differential equations with nesterov's accelerated gradient method
We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose
layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's …
layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's …