Implicit regularization in deep learning may not be explainable by norms

N Razin, N Cohen - Advances in neural information …, 2020 - proceedings.neurips.cc
Mathematically characterizing the implicit regularization induced by gradient-based
optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is …

The implicit bias of minima stability: A view from function space

R Mulayoff, T Michaeli… - Advances in Neural …, 2021 - proceedings.neurips.cc
The loss terrains of over-parameterized neural networks have multiple global minima.
However, it is well known that stochastic gradient descent (SGD) can stably converge only to …

Unique properties of flat minima in deep networks

R Mulayoff, T Michaeli - International conference on machine …, 2020 - proceedings.mlr.press
It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In
deep neural network training, this mechanism serves to screen out minima. However, the …

How much does initialization affect generalization?

S Ramasinghe, LE MacDonald… - International …, 2023 - proceedings.mlr.press
Characterizing the remarkable generalization properties of over-parameterized neural
networks remains an open problem. A growing body of recent literature shows that the bias …

A correspondence between normalization strategies in artificial and biological neural networks

Y Shen, J Wang, S Navlakha - Neural computation, 2021 - direct.mit.edu
A fundamental challenge at the interface of machine learning and neuroscience is to
uncover computational principles that are shared between artificial and biological neural …

Deep learning from a statistical perspective

Y Yuan, Y Deng, Y Zhang, A Qu - Stat, 2020 - Wiley Online Library
As one of the most rapidly developing artificial intelligence techniques, deep learning has
been applied in various machine learning tasks and has received great attention in data …

Stationary probability distributions of stochastic gradient descent and the success and failure of the diffusion approximation

WJ McCann - 2021 - digitalcommons.njit.edu
Abstract In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally
popular due to its computational efficiency, is analyzed using Markov chain methods. We …