Implicit regularization of normalization methods

N Razin, N Cohen - Advances in neural information …, 2020 - proceedings.neurips.cc

Mathematically characterizing the implicit regularization induced by gradient-based
optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is …

被引用次数：162 相关文章所有 7 个版本

[PDF] neurips.cc

The implicit bias of minima stability: A view from function space

R Mulayoff, T Michaeli… - Advances in Neural …, 2021 - proceedings.neurips.cc

The loss terrains of over-parameterized neural networks have multiple global minima.
However, it is well known that stochastic gradient descent (SGD) can stably converge only to …

被引用次数：43 相关文章所有 5 个版本

[PDF] mlr.press

Unique properties of flat minima in deep networks

R Mulayoff, T Michaeli - International conference on machine …, 2020 - proceedings.mlr.press

It is well known that (stochastic) gradient descent has an implicit bias towards flat minima. In
deep neural network training, this mechanism serves to screen out minima. However, the …

被引用次数：39 相关文章所有 5 个版本

[PDF] mlr.press

How much does initialization affect generalization?

S Ramasinghe, LE MacDonald… - International …, 2023 - proceedings.mlr.press

Characterizing the remarkable generalization properties of over-parameterized neural
networks remains an open problem. A growing body of recent literature shows that the bias …

被引用次数：4 相关文章所有 4 个版本

[HTML] nih.gov

A correspondence between normalization strategies in artificial and biological neural networks

Y Shen, J Wang, S Navlakha - Neural computation, 2021 - direct.mit.edu

A fundamental challenge at the interface of machine learning and neuroscience is to
uncover computational principles that are shared between artificial and biological neural …

被引用次数：24 相关文章所有 12 个版本

[PDF] psu.edu

Deep learning from a statistical perspective

Y Yuan, Y Deng, Y Zhang, A Qu - Stat, 2020 - Wiley Online Library

As one of the most rapidly developing artificial intelligence techniques, deep learning has
been applied in various machine learning tasks and has received great attention in data …

被引用次数：11 相关文章所有 6 个版本

[PDF] njit.edu

Stationary probability distributions of stochastic gradient descent and the success and failure of the diffusion approximation

WJ McCann - 2021 - digitalcommons.njit.edu

Abstract In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally
popular due to its computational efficiency, is analyzed using Markov chain methods. We …