High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arXiv preprint arXiv …, 2022 - arxiv.org
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

Data-driven emergence of convolutional structure in neural networks

A Ingrosso, S Goldt - … of the National Academy of Sciences, 2022 - National Acad Sciences
Exploiting data invariances is crucial for efficient learning in both artificial and biological
neural circuits. Understanding how neural networks can discover appropriate …

From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to sgd in two-layers networks

L Arnaboldi, L Stephan, F Krzakala… - The Thirty Sixth …, 2023 - proceedings.mlr.press
This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a
two-layer neural network trained on Gaussian data and labels generated by a similar …

On the different regimes of stochastic gradient descent

A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences
Modern deep networks are trained with stochastic gradient descent (SGD) whose key
hyperparameters are the number of data considered at each step or batch size B, and the …

Rigorous dynamical mean-field theory for stochastic gradient descent methods

C Gerbelot, E Troiani, F Mignacco, F Krzakala… - SIAM Journal on …, 2024 - SIAM
We prove closed-form equations for the exact high-dimensional asymptotics of a family of
first-order gradient-based methods, learning an estimator (eg, M-estimator, shallow neural …