From high-dimensional & mean-field dynamics to dimensionless odes: A unifying approach to...

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

被引用次数：17 相关文章所有 8 个版本

[PDF] neurips.cc

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …

被引用次数：19 相关文章所有 8 个版本

[PDF] arxiv.org

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org

We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

Learning time-scales in two-layers neural networks

R Berthier, A Montanari, K Zhou - arXiv preprint arXiv:2303.00055, 2023 - arxiv.org

Gradient-based learning in multi-layer neural networks displays a number of striking
features. In particular, the decrease rate of empirical risk is non-monotone even after …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

On learning gaussian multi-index models with gradient flow

A Bietti, J Bruna, L Pillaud-Vivien - arXiv preprint arXiv:2310.19793, 2023 - arxiv.org

We study gradient flow on the multi-index regression problem for high-dimensional
Gaussian data. Multi-index functions consist of a composition of an unknown low-rank linear …

被引用次数：17 相关文章所有 3 个版本

[HTML] pnas.org Full View

On the different regimes of stochastic gradient descent

A Sclocchi, M Wyart - … of the National Academy of Sciences, 2024 - National Acad Sciences

Modern deep networks are trained with stochastic gradient descent (SGD) whose key
hyperparameters are the number of data considered at each step or batch size B, and the …

被引用次数：5 相关文章所有 6 个版本

[PDF] mlr.press

On the impact of overparameterization on the training of a shallow neural network in high dimensions

S Martin, F Bach, G Biroli - International Conference on …, 2024 - proceedings.mlr.press

We study the training dynamics of a shallow neural network with quadratic activation
functions and quadratic cost in a teacher-student setup. In line with previous works on the …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Grokking as the transition from lazy to rich training dynamics

T Kumar, B Bordelon, SJ Gershman… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose that the grokking phenomenon, where the train loss of a neural network
decreases much earlier than its test loss, can arise due to a neural network transitioning …

被引用次数：15 相关文章所有 5 个版本

[PDF] neurips.cc

On Single-Index Models beyond Gaussian Data

A Zweig, L Pillaud-Vivien… - Advances in Neural …, 2024 - proceedings.neurips.cc

Sparse high-dimensional functions have arisen as a rich framework to study the behavior of
gradient-descent methods using shallow neural networks, and showcasing its ability to …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Asymptotics of feature learning in two-layer networks after one gradient-step

H Cui, L Pesce, Y Dandi, F Krzakala, YM Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

In this manuscript we investigate the problem of how two-layer neural networks learn
features from data, and improve over the kernel regime, after being trained with a single …

被引用次数：7 相关文章所有 4 个版本