The interpolation phase transition in neural networks: Memorization and generalization under...

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：339 相关文章所有 12 个版本

[PDF] neurips.cc

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

被引用次数：114 相关文章所有 9 个版本

[PDF] neurips.cc

A geometric analysis of neural collapse with unconstrained features

Z Zhu, T Ding, J Zhou, X Li, C You… - Advances in Neural …, 2021 - proceedings.neurips.cc

We provide the first global optimization landscape analysis of Neural Collapse--an intriguing
empirical phenomenon that arises in the last-layer classifiers and features of neural …

被引用次数：171 相关文章所有 10 个版本

[PDF] neurips.cc

Benign overfitting in two-layer convolutional neural networks

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

被引用次数：102 相关文章所有 8 个版本

[PDF] jmlr.org

Benign overfitting in ridge regression

A Tsigler, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org

In many modern applications of deep learning the neural network has many more
parameters than the data points used for its training. Motivated by those practices, a large …

被引用次数：222 相关文章所有 3 个版本

[PDF] arxiv.org

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arXiv preprint arXiv …, 2021 - cambridge.org

We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

被引用次数：187 相关文章所有 8 个版本

[PDF] mlr.press

Universality of empirical risk minimization

A Montanari, BN Saeed - Conference on Learning Theory, 2022 - proceedings.mlr.press

Consider supervised learning from iid samples {(y_i, x_i)} _ {i≤ n} where x_i∈ R_p are
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …

被引用次数：97 相关文章所有 3 个版本

[PDF] mlr.press

Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data

S Frei, NS Chatterji, P Bartlett - Conference on Learning …, 2022 - proceedings.mlr.press

Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …

被引用次数：85 相关文章所有 4 个版本

[PDF] sciencedirect.com

Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration

S Mei, T Misiakiewicz, A Montanari - Applied and Computational Harmonic …, 2022 - Elsevier

Consider the classical supervised learning problem: we are given data (yi, xi), i≤ n, with yia
response and xi∈ X a covariates vector, and try to learn a model f ˆ: X→ R to predict future …

被引用次数：131 相关文章所有 6 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：10 相关文章所有 6 个版本