Evaluation of neural architectures trained with square loss vs cross-entropy in classification tasks

V Kothapalli - arXiv preprint arXiv:2206.04041, 2022 - arxiv.org

Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …

被引用次数：77 相关文章所有 3 个版本

[PDF] arxiv.org

A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org

The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

被引用次数：87 相关文章所有 2 个版本

[PDF] arxiv.org

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org

In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

被引用次数：256 相关文章所有 6 个版本

[PDF] neurips.cc

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc

We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

被引用次数：305 相关文章所有 7 个版本

[PDF] mlr.press

Cramming: Training a Language Model on a single GPU in one day.

J Geiping, T Goldstein - International Conference on …, 2023 - proceedings.mlr.press

Recent trends in language modeling have focused on increasing performance through
scaling, and have resulted in an environment where training language models is out of …

被引用次数：74 相关文章所有 7 个版本

[PDF] neurips.cc

Test time adaptation via conjugate pseudo-labels

S Goyal, M Sun, A Raghunathan… - Advances in Neural …, 2022 - proceedings.neurips.cc

Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts,
specifically with just access to unlabeled test samples from the new domain at test-time …

被引用次数：93 相关文章所有 6 个版本

[PDF] mlr.press

On the optimization landscape of neural collapse under mse loss: Global optimality with unconstrained features

J Zhou, X Li, T Ding, C You, Q Qu… - … on Machine Learning, 2022 - proceedings.mlr.press

When training deep neural networks for classification tasks, an intriguing empirical
phenomenon has been widely observed in the last-layer classifiers and features, where (i) …

被引用次数：103 相关文章所有 7 个版本

[PDF] neurips.cc

The effects of regularization and data augmentation are class dependent

R Balestriero, L Bottou… - Advances in Neural …, 2022 - proceedings.neurips.cc

Regularization is a fundamental technique to prevent over-fitting and to improve
generalization performances by constraining a model's complexity. Current Deep Networks …

被引用次数：105 相关文章所有 6 个版本

[PDF] mlr.press

Extended unconstrained features model for exploring deep neural collapse

T Tirer, J Bruna - International Conference on Machine …, 2022 - proceedings.mlr.press

The modern strategy for training deep neural networks for classification tasks includes
optimizing the network's weights even after the training error vanishes to further push the …

被引用次数：95 相关文章所有 8 个版本

[PDF] neurips.cc

Are all losses created equal: A neural collapse perspective

J Zhou, C You, X Li, K Liu, S Liu… - Advances in Neural …, 2022 - proceedings.neurips.cc

While cross entropy (CE) is the most commonly used loss function to train deep neural
networks for classification tasks, many alternative losses have been developed to obtain …

被引用次数：57 相关文章所有 8 个版本