Learning in the presence of low-dimensional structure: a spiked random matrix perspective

J Ba, MA Erdogdu, T Suzuki… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the learning of a single-index target function $ f_*:\mathbb {R}^ d\to\mathbb {R}
$ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents

Y Dandi, E Troiani, L Arnaboldi, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org
We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …

Asymptotics of feature learning in two-layer networks after one gradient-step

H Cui, L Pesce, Y Dandi, F Krzakala, YM Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
In this manuscript we investigate the problem of how two-layer neural networks learn
features from data, and improve over the kernel regime, after being trained with a single …

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Z Wang, D Wu, Z Fan - arXiv preprint arXiv:2402.10127, 2024 - arxiv.org
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK)
defined by the nonlinear feature map of a feedforward neural network. However, existing …

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

L Arnaboldi, Y Dandi, F Krzakala, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arXiv preprint arXiv:2406.11828, 2024 - arxiv.org
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

L Arnaboldi, Y Dandi, F Krzakala, B Loureiro… - arXiv preprint arXiv …, 2024 - arxiv.org
We study the impact of the batch size $ n_b $ on the iteration time $ T $ of training two-layer
neural networks with one-pass stochastic gradient descent (SGD) on multi-index target …

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arXiv preprint arXiv:2406.01581, 2024 - arxiv.org
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

A Nitanda, K Oko, T Suzuki, D Wu - The Twelfth International …, 2024 - openreview.net
Recent works have shown that neural networks optimized by gradient-based methods can
adapt to sparse or low-dimensional target functions through feature learning; an often …