Neural collapse: A review on modelling principles and generalization
V Kothapalli - arXiv preprint arXiv:2206.04041, 2022 - arxiv.org
Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
Directional convergence and alignment in deep learning
Z Ji, M Telgarsky - Advances in Neural Information …, 2020 - proceedings.neurips.cc
In this paper, we show that although the minimizers of cross-entropy and related
classification losses are off at infinity, network weights learned by gradient flow converge in …
classification losses are off at infinity, network weights learned by gradient flow converge in …
Fantastic generalization measures and where to find them
Generalization of deep networks has been of great interest in recent years, resulting in a
number of theoretically and empirically motivated complexity measures. However, most …
number of theoretically and empirically motivated complexity measures. However, most …
On the measure of intelligence
F Chollet - arXiv preprint arXiv:1911.01547, 2019 - arxiv.org
To make deliberate progress towards more intelligent and more human-like artificial
systems, we need to be following an appropriate feedback signal: we need to be able to …
systems, we need to be following an appropriate feedback signal: we need to be able to …
The modern mathematics of deep learning
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …
around a list of research questions that were not answered within the classical framework of …
Predicting with confidence on unseen distributions
Recent work has shown that the accuracy of machine learning models can vary substantially
when evaluated on a distribution that even slightly differs from that of the training data. As a …
when evaluated on a distribution that even slightly differs from that of the training data. As a …
Network pruning via performance maximization
Channel pruning is a class of powerful methods for model compression. When pruning a
neural network, it's ideal to obtain a sub-network with higher accuracy. However, a sub …
neural network, it's ideal to obtain a sub-network with higher accuracy. However, a sub …
Exploring the limits of large scale pre-training
Recent developments in large-scale machine learning suggest that by scaling up data,
model size and training time properly, one might observe that improvements in pre-training …
model size and training time properly, one might observe that improvements in pre-training …
Deep learning through the lens of example difficulty
Existing work on understanding deep learning often employs measures that compress all
data-dependent information into a few numbers. In this work, we adopt a perspective based …
data-dependent information into a few numbers. In this work, we adopt a perspective based …
Permutation equivariant neural functionals
This work studies the design of neural networks that can process the weights or gradients of
other neural networks, which we refer to as neural functional networks (NFNs). Despite a …
other neural networks, which we refer to as neural functional networks (NFNs). Despite a …