Implicit regularization in deep learning may not be explainable by norms
Mathematically characterizing the implicit regularization induced by gradient-based
optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is …
optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is …
The implicit bias of minima stability: A view from function space
R Mulayoff, T Michaeli… - Advances in Neural …, 2021 - proceedings.neurips.cc
The loss terrains of over-parameterized neural networks have multiple global minima.
However, it is well known that stochastic gradient descent (SGD) can stably converge only to …
However, it is well known that stochastic gradient descent (SGD) can stably converge only to …
Unique properties of flat minima in deep networks
R Mulayoff, T Michaeli - International conference on machine …, 2020 - proceedings.mlr.press
It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In
deep neural network training, this mechanism serves to screen out minima. However, the …
deep neural network training, this mechanism serves to screen out minima. However, the …
How much does initialization affect generalization?
S Ramasinghe, LE MacDonald… - International …, 2023 - proceedings.mlr.press
Characterizing the remarkable generalization properties of over-parameterized neural
networks remains an open problem. A growing body of recent literature shows that the bias …
networks remains an open problem. A growing body of recent literature shows that the bias …
A correspondence between normalization strategies in artificial and biological neural networks
A fundamental challenge at the interface of machine learning and neuroscience is to
uncover computational principles that are shared between artificial and biological neural …
uncover computational principles that are shared between artificial and biological neural …
Deep learning from a statistical perspective
As one of the most rapidly developing artificial intelligence techniques, deep learning has
been applied in various machine learning tasks and has received great attention in data …
been applied in various machine learning tasks and has received great attention in data …
Stationary probability distributions of stochastic gradient descent and the success and failure of the diffusion approximation
WJ McCann - 2021 - digitalcommons.njit.edu
Abstract In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally
popular due to its computational efficiency, is analyzed using Markov chain methods. We …
popular due to its computational efficiency, is analyzed using Markov chain methods. We …