High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Learning single-index models with shallow neural networks
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …
applied to an unknown one-dimensional projection of the input. These models are …
High-dimensional limit theorems for sgd: Effective dynamics and critical scaling
G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
Provable guarantees for neural networks via gradient feature learning
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
Dynamics of finite width kernel and prediction fluctuations in mean field neural networks
B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …
networks. Starting from a dynamical mean field theory description of infinite width deep …
Neural networks efficiently learn low-dimensional representations with sgd
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
Data-driven emergence of convolutional structure in neural networks
A Ingrosso, S Goldt - … of the National Academy of Sciences, 2022 - National Acad Sciences
Exploiting data invariances is crucial for efficient learning in both artificial and biological
neural circuits. Understanding how neural networks can discover appropriate …
neural circuits. Understanding how neural networks can discover appropriate …
From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to sgd in two-layers networks
This manuscript investigates the one-pass stochastic gradient descent (SGD) dynamics of a
two-layer neural network trained on Gaussian data and labels generated by a similar …
two-layer neural network trained on Gaussian data and labels generated by a similar …
On the different regimes of stochastic gradient descent
A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences
Modern deep networks are trained with stochastic gradient descent (SGD) whose key
hyperparameters are the number of data considered at each step or batch size B, and the …
hyperparameters are the number of data considered at each step or batch size B, and the …
Rigorous dynamical mean-field theory for stochastic gradient descent methods
We prove closed-form equations for the exact high-dimensional asymptotics of a family of
first-order gradient-based methods, learning an estimator (eg, M-estimator, shallow neural …
first-order gradient-based methods, learning an estimator (eg, M-estimator, shallow neural …