The gaussian equivalence of generative models for learning with two-layer neural networks

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：372 相关文章所有 12 个版本

[HTML] nih.gov

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov

Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

被引用次数：945 相关文章所有 16 个版本

[PDF] mlr.press

Universality of empirical risk minimization

A Montanari, BN Saeed - Conference on Learning Theory, 2022 - proceedings.mlr.press

Consider supervised learning from iid samples {(y_i, x_i)} _ {i≤ n} where x_i∈ R_p are
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …

被引用次数：108 相关文章所有 3 个版本

[PDF] arxiv.org

Universality laws for high-dimensional learning with random features

H Hu, YM Lu - IEEE Transactions on Information Theory, 2022 - ieeexplore.ieee.org

We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …

被引用次数：168 相关文章所有 7 个版本

[PDF] mlr.press

Classifying high-dimensional gaussian mixtures: Where kernel methods fail and neural networks succeed

M Refinetti, S Goldt, F Krzakala… - … on Machine Learning, 2021 - proceedings.mlr.press

A recent series of theoretical works showed that the dynamics of neural networks with a
certain initialisation are well-captured by kernel methods. Concurrent empirical work …

被引用次数：94 相关文章所有 7 个版本

[PDF] arxiv.org

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arXiv preprint arXiv:2009.10713, 2020 - arxiv.org

The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

被引用次数：131 相关文章所有 3 个版本

[PDF] neurips.cc

Triple descent and the two kinds of overfitting: Where & why do they appear?

S d'Ascoli, L Sagun, G Biroli - Advances in neural …, 2020 - proceedings.neurips.cc

A recent line of research has highlighted the existence of a``double descent''phenomenon in
deep learning, whereby increasing the number of training examples N causes the …

被引用次数：99 相关文章所有 7 个版本

[PDF] aaai.org

Provable benefits of overparameterization in model compression: From double descent to pruning neural networks

X Chang, Y Li, S Oymak, C Thrampoulidis - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Deep networks are typically trained with many more parameters than the size of the training
dataset. Recent empirical evidence indicates that the practice of overparameterization not …

被引用次数：59 相关文章所有 7 个版本

[PDF] neurips.cc

A dynamical central limit theorem for shallow neural networks

Z Chen, G Rotskoff, J Bruna… - Advances in Neural …, 2020 - proceedings.neurips.cc

Recent theoretical work has characterized the dynamics and convergence properties for
wide shallow neural networks trained via gradient descent; the asymptotic regime in which …

被引用次数：39 相关文章所有 9 个版本

[PDF] marcmezard.fr

[PDF][PDF] Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

B Loureiro, C Gerbelot, H Cui, S Goldt, F Krzakala… - stat, 2021 - marcmezard.fr

Teacher-student models provide a powerful framework in which the typical case
performance of highdimensional supervised learning tasks can be studied in closed form. In …

被引用次数：38 相关文章