On the implicit bias in deep-learning algorithms

G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

Sgd with large step sizes learns sparse features

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press
We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Understanding edge-of-stability training dynamics with a minimalist example

X Zhu, Z Wang, X Wang, M Zhou, R Ge - arXiv preprint arXiv:2210.03294, 2022 - arxiv.org
Recently, researchers observed that gradient descent for deep neural networks operates in
an``edge-of-stability''(EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is …

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

Implicit bias of gradient descent for logistic regression at the edge of stability

J Wu, V Braverman, JD Lee - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …

Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond

I Kreisler, MS Nacson, D Soudry… - … on Machine Learning, 2023 - proceedings.mlr.press
Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …