Large learning rate tames homogeneity: Convergence and balancing effect

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

被引用次数：90 相关文章所有 5 个版本

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

被引用次数：113 相关文章所有 7 个版本

[PDF] neurips.cc

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

被引用次数：79 相关文章所有 8 个版本

[PDF] mlr.press

Sgd with large step sizes learns sparse features

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

被引用次数：59 相关文章所有 7 个版本

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

被引用次数：12 相关文章所有 3 个版本

[PDF] neurips.cc

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc

Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

Understanding edge-of-stability training dynamics with a minimalist example

X Zhu, Z Wang, X Wang, M Zhou, R Ge - arXiv preprint arXiv:2210.03294, 2022 - arxiv.org

Recently, researchers observed that gradient descent for deep neural networks operates in
an``edge-of-stability''(EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is …

被引用次数：40 相关文章所有 5 个版本

[PDF] arxiv.org

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

被引用次数：33 相关文章所有 7 个版本

[PDF] neurips.cc

Implicit bias of gradient descent for logistic regression at the edge of stability

J Wu, V Braverman, JD Lee - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …

被引用次数：16 相关文章所有 7 个版本

[PDF] mlr.press

Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond

I Kreisler, MS Nacson, D Soudry… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …

被引用次数：11 相关文章所有 7 个版本