Predicting the generalization gap in deep networks with margin distributions

V Kothapalli - arXiv preprint arXiv:2206.04041, 2022 - arxiv.org

Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …

被引用次数：74 相关文章所有 3 个版本

[PDF] neurips.cc

Directional convergence and alignment in deep learning

Z Ji, M Telgarsky - Advances in Neural Information …, 2020 - proceedings.neurips.cc

In this paper, we show that although the minimizers of cross-entropy and related
classification losses are off at infinity, network weights learned by gradient flow converge in …

被引用次数：180 相关文章所有 8 个版本

[PDF] arxiv.org

Fantastic generalization measures and where to find them

Y Jiang, B Neyshabur, H Mobahi, D Krishnan… - arXiv preprint arXiv …, 2019 - arxiv.org

Generalization of deep networks has been of great interest in recent years, resulting in a
number of theoretically and empirically motivated complexity measures. However, most …

被引用次数：662 相关文章所有 10 个版本

[PDF] arxiv.org

On the measure of intelligence

F Chollet - arXiv preprint arXiv:1911.01547, 2019 - arxiv.org

To make deliberate progress towards more intelligent and more human-like artificial
systems, we need to be following an appropriate feedback signal: we need to be able to …

被引用次数：641 相关文章所有 5 个版本

[PDF] arxiv.org

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org

We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

被引用次数：202 相关文章所有 8 个版本

[PDF] thecvf.com

Predicting with confidence on unseen distributions

D Guillory, V Shankar, S Ebrahimi… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent work has shown that the accuracy of machine learning models can vary substantially
when evaluated on a distribution that even slightly differs from that of the training data. As a …

被引用次数：148 相关文章所有 5 个版本

[PDF] thecvf.com

Network pruning via performance maximization

S Gao, F Huang, W Cai… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Channel pruning is a class of powerful methods for model compression. When pruning a
neural network, it's ideal to obtain a sub-network with higher accuracy. However, a sub …

被引用次数：146 相关文章所有 6 个版本

[PDF] arxiv.org

Exploring the limits of large scale pre-training

S Abnar, M Dehghani, B Neyshabur… - arXiv preprint arXiv …, 2021 - arxiv.org

Recent developments in large-scale machine learning suggest that by scaling up data,
model size and training time properly, one might observe that improvements in pre-training …

被引用次数：127 相关文章所有 4 个版本

[PDF] neurips.cc

Deep learning through the lens of example difficulty

R Baldock, H Maennel… - Advances in Neural …, 2021 - proceedings.neurips.cc

Existing work on understanding deep learning often employs measures that compress all
data-dependent information into a few numbers. In this work, we adopt a perspective based …

被引用次数：149 相关文章所有 8 个版本

[PDF] neurips.cc

Permutation equivariant neural functionals

A Zhou, K Yang, K Burns, A Cardace… - Advances in neural …, 2024 - proceedings.neurips.cc

This work studies the design of neural networks that can process the weights or gradients of
other neural networks, which we refer to as neural functional networks (NFNs). Despite a …

被引用次数：45 相关文章所有 5 个版本