High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Towards understanding grokking: An effective theory of representation learning
We aim to understand grokking, a phenomenon where models generalize long after
overfitting their training set. We present both a microscopic analysis anchored by an effective …
overfitting their training set. We present both a microscopic analysis anchored by an effective …
Learning curves of generic features maps for realistic datasets with a teacher-student model
Teacher-student models provide a framework in which the typical-case performance of high-
dimensional supervised learning can be described in closed form. The assumptions of …
dimensional supervised learning can be described in closed form. The assumptions of …
Universality laws for high-dimensional learning with random features
We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …
terms of training and generalization errors, a random feature model with a nonlinear …
Generalisation error in learning with random features and the hidden manifold model
We study generalised linear regression and classification for a synthetically generated
dataset encompassing different problems of interest, such as learning with random features …
dataset encompassing different problems of interest, such as learning with random features …
A statistical mechanics framework for Bayesian deep neural networks beyond the infinite-width limit
Despite the practical success of deep neural networks, a comprehensive theoretical
framework that can predict practically relevant scores, such as the test accuracy, from …
framework that can predict practically relevant scores, such as the test accuracy, from …
Deterministic equivalent and error universality of deep random features learning
This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …
using a fully connected network with frozen intermediate layers and trainable readout layer …
Bayes-optimal learning of deep random networks of extensive-width
We consider the problem of learning a target function corresponding to a deep, extensive-
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
Neural networks trained with SGD learn distributions of increasing complexity
The uncanny ability of over-parameterised neural networks to generalise well has been
explained using various" simplicity biases". These theories postulate that neural networks …
explained using various" simplicity biases". These theories postulate that neural networks …
Precise learning curves and higher-order scalings for dot-product kernel regression
As modern machine learning models continue to advance the computational frontier, it has
become increasingly important to develop precise estimates for expected performance …
become increasingly important to develop precise estimates for expected performance …