Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov
Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

Universality of empirical risk minimization

A Montanari, BN Saeed - Conference on Learning Theory, 2022 - proceedings.mlr.press
Consider supervised learning from iid samples {(y_i, x_i)} _ {i≤ n} where x_i∈ R_p are
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …

Universality laws for high-dimensional learning with random features

H Hu, YM Lu - IEEE Transactions on Information Theory, 2022 - ieeexplore.ieee.org
We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …

Classifying high-dimensional gaussian mixtures: Where kernel methods fail and neural networks succeed

M Refinetti, S Goldt, F Krzakala… - … on Machine Learning, 2021 - proceedings.mlr.press
A recent series of theoretical works showed that the dynamics of neural networks with a
certain initialisation are well-captured by kernel methods. Concurrent empirical work …

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arXiv preprint arXiv:2009.10713, 2020 - arxiv.org
The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

Triple descent and the two kinds of overfitting: Where & why do they appear?

S d'Ascoli, L Sagun, G Biroli - Advances in neural …, 2020 - proceedings.neurips.cc
A recent line of research has highlighted the existence of a``double descent''phenomenon in
deep learning, whereby increasing the number of training examples N causes the …

Provable benefits of overparameterization in model compression: From double descent to pruning neural networks

X Chang, Y Li, S Oymak, C Thrampoulidis - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Deep networks are typically trained with many more parameters than the size of the training
dataset. Recent empirical evidence indicates that the practice of overparameterization not …

A dynamical central limit theorem for shallow neural networks

Z Chen, G Rotskoff, J Bruna… - Advances in Neural …, 2020 - proceedings.neurips.cc
Recent theoretical work has characterized the dynamics and convergence properties for
wide shallow neural networks trained via gradient descent; the asymptotic regime in which …

[PDF][PDF] Capturing the learning curves of generic features maps for realistic data sets with a teacher-student model

B Loureiro, C Gerbelot, H Cui, S Goldt, F Krzakala… - stat, 2021 - marcmezard.fr
Teacher-student models provide a powerful framework in which the typical case
performance of highdimensional supervised learning tasks can be studied in closed form. In …