High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Universality of empirical risk minimization
A Montanari, BN Saeed - Conference on Learning Theory, 2022 - proceedings.mlr.press
Consider supervised learning from iid samples {(y_i, x_i)} _ {i≤ n} where x_i∈ R_p are
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …
feature vectors and y_i∈ R are labels. We study empirical risk minimization over a class of …
More than a toy: Random matrix models predict how real-world neural representations generalize
Of theories for why large-scale machine learning models generalize despite being vastly
overparameterized, which of their assumptions are needed to capture the qualitative …
overparameterized, which of their assumptions are needed to capture the qualitative …
Universality laws for high-dimensional learning with random features
We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …
terms of training and generalization errors, a random feature model with a nonlinear …
A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit
Despite the practical success of deep neural networks, a comprehensive theoretical
framework that can predict practically relevant scores, such as the test accuracy, from …
framework that can predict practically relevant scores, such as the test accuracy, from …
Self-consistent dynamical field theory of kernel evolution in wide neural networks
B Bordelon, C Pehlevan - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We analyze feature learning in infinite-width neural networks trained with gradient flow
through a self-consistent dynamical field theory. We construct a collection of deterministic …
through a self-consistent dynamical field theory. We construct a collection of deterministic …
Deterministic equivalent and error universality of deep random features learning
This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …
using a fully connected network with frozen intermediate layers and trainable readout layer …
Bayes-optimal learning of deep random networks of extensive-width
We consider the problem of learning a target function corresponding to a deep, extensive-
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
On the stepwise nature of self-supervised learning
We present a simple picture of the training process of self-supervised learning methods with
dual deep networks. In our picture, these methods learn their high-dimensional embeddings …
dual deep networks. In our picture, these methods learn their high-dimensional embeddings …
Neural networks trained with SGD learn distributions of increasing complexity
The uncanny ability of over-parameterised neural networks to generalise well has been
explained using various" simplicity biases". These theories postulate that neural networks …
explained using various" simplicity biases". These theories postulate that neural networks …