Neural collapse: A review on modelling principles and generalization
V Kothapalli - arXiv preprint arXiv:2206.04041, 2022 - arxiv.org
Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning
The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …
questions that challenge the longstanding dogma of the field. One of the most important …
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …
triumphs of deep neural networks on practical challenges. However, the gap between theory …
Gradient starvation: A learning proclivity in neural networks
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
Cramming: Training a Language Model on a single GPU in one day.
J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …
scaling, and have resulted in an environment where training language models is out of …
Test time adaptation via conjugate pseudo-labels
Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts,
specifically with just access to unlabeled test samples from the new domain at test-time …
specifically with just access to unlabeled test samples from the new domain at test-time …
On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features
When training deep neural networks for classification tasks, an intriguing empirical
phenomenon has been widely observed in the last-layer classifiers and features, where (i) …
phenomenon has been widely observed in the last-layer classifiers and features, where (i) …
The effects of regularization and data augmentation are class dependent
R Balestriero, L Bottou… - Advances in Neural …, 2022 - proceedings.neurips.cc
Regularization is a fundamental technique to prevent over-fitting and to improve
generalization performances by constraining a model's complexity. Current Deep Networks …
generalization performances by constraining a model's complexity. Current Deep Networks …
Extended unconstrained features model for exploring deep neural collapse
The modern strategy for training deep neural networks for classification tasks includes
optimizing the network's weights even after the training error vanishes to further push the …
optimizing the network's weights even after the training error vanishes to further push the …
Are all losses created equal: A neural collapse perspective
While cross entropy (CE) is the most commonly used loss function to train deep neural
networks for classification tasks, many alternative losses have been developed to obtain …
networks for classification tasks, many alternative losses have been developed to obtain …