Neural collapse: A review on modelling principles and generalization

V Kothapalli - arXiv preprint arXiv:2206.04041, 2022 - arxiv.org
Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …

A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org
The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press
Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

Test time adaptation via conjugate pseudo-labels

S Goyal, M Sun, A Raghunathan… - Advances in Neural …, 2022 - proceedings.neurips.cc
Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts,
specifically with just access to unlabeled test samples from the new domain at test-time …

On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features

J Zhou, X Li, T Ding, C You, Q Qu… - … on Machine Learning, 2022 - proceedings.mlr.press
When training deep neural networks for classification tasks, an intriguing empirical
phenomenon has been widely observed in the last-layer classifiers and features, where (i) …

The effects of regularization and data augmentation are class dependent

R Balestriero, L Bottou… - Advances in Neural …, 2022 - proceedings.neurips.cc
Regularization is a fundamental technique to prevent over-fitting and to improve
generalization performances by constraining a model's complexity. Current Deep Networks …

Extended unconstrained features model for exploring deep neural collapse

T Tirer, J Bruna - International Conference on Machine …, 2022 - proceedings.mlr.press
The modern strategy for training deep neural networks for classification tasks includes
optimizing the network's weights even after the training error vanishes to further push the …

Are all losses created equal: A neural collapse perspective

J Zhou, C You, X Li, K Liu, S Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc
While cross entropy (CE) is the most commonly used loss function to train deep neural
networks for classification tasks, many alternative losses have been developed to obtain …