On the implicit bias in deep-learning algorithms
G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …
successful in recent years and has led to dramatic improvements in multiple domains …
Understanding gradient descent on the edge of stability in deep learning
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Understanding the generalization benefit of normalization layers: Sharpness reduction
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
Sgd with large step sizes learns sparse features
M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press
We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …
in the training of neural networks. We present empirical observations that commonly used …
(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
Learning threshold neurons via edge of stability
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
Understanding edge-of-stability training dynamics with a minimalist example
Recently, researchers observed that gradient descent for deep neural networks operates in
an``edge-of-stability''(EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is …
an``edge-of-stability''(EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is …
(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
Implicit bias of gradient descent for logistic regression at the edge of stability
Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …
Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond
Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …