Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
A geometric analysis of neural collapse with unconstrained features
We provide the first global optimization landscape analysis of Neural Collapse--an intriguing
empirical phenomenon that arises in the last-layer classifiers and features of neural …
empirical phenomenon that arises in the last-layer classifiers and features of neural …
Benign overfitting in two-layer convolutional neural networks
Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …
training data, while still achieving a good test performance. This phenomenon is referred to …
Benign overfitting in ridge regression
A Tsigler, PL Bartlett - Journal of Machine Learning Research, 2023 - jmlr.org
In many modern applications of deep learning the neural network has many more
parameters than the data points used for its training. Motivated by those practices, a large …
parameters than the data points used for its training. Motivated by those practices, a large …
The modern mathematics of deep learning
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …
around a list of research questions that were not answered within the classical framework of …
Universality of empirical risk minimization
A Montanari, BN Saeed - Conference on Learning Theory, 2022 - proceedings.mlr.press
Consider supervised learning from iid samples {(y_i, x_i)} _ {i≤ n} where x_i∈ R_p are
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …
Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data
Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …
presence of noisy data, was first observed in neural network models trained with gradient …
Generalization error of random feature and kernel methods: hypercontractivity and kernel matrix concentration
Consider the classical supervised learning problem: we are given data (yi, xi), i≤ n, with yia
response and xi∈ X a covariates vector, and try to learn a model f ˆ: X→ R to predict future …
response and xi∈ X a covariates vector, and try to learn a model f ˆ: X→ R to predict future …
Provable guarantees for neural networks via gradient feature learning
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …