The computational complexity of learning gaussian single-index models

A Damian, L Pillaud-Vivien, JD Lee, J Bruna - arXiv preprint arXiv …, 2024 - arxiv.org
Single-Index Models are high-dimensional regression problems with planted structure,
whereby labels depend on an unknown one-dimensional projection of the input via a …

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

L Arnaboldi, Y Dandi, F Krzakala, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org
Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arXiv preprint arXiv:2406.11828, 2024 - arxiv.org
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arXiv preprint arXiv:2406.01581, 2024 - arxiv.org
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

A spring-block theory of feature learning in deep neural networks

C Shi, L Pan, I Dokmanić - arXiv preprint arXiv:2407.19353, 2024 - arxiv.org
A central question in deep learning is how deep neural networks (DNNs) learn features.
DNN layers progressively collapse data into a regular low-dimensional geometry. This …

Sliding down the stairs: how correlated latent variables accelerate learning with neural networks

L Bardone, S Goldt - arXiv preprint arXiv:2404.08602, 2024 - arxiv.org
Neural networks extract features from data using stochastic gradient descent (SGD). In
particular, higher-order input cumulants (HOCs) are crucial for their performance. However …

On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries

N Joshi, T Misiakiewicz, N Srebro - arXiv preprint arXiv:2407.05622, 2024 - arxiv.org
The goal of this paper is to investigate the complexity of gradient algorithms when learning
sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf {SQ} $), which …

Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics

A Mousavi-Hosseini, D Wu, MA Erdogdu - arXiv preprint arXiv:2408.07254, 2024 - arxiv.org
We study the problem of learning multi-index models in high-dimensions using a two-layer
neural network trained with the mean-field Langevin algorithm. Under mild distributional …

Towards a theory of how the structure of language is acquired by deep neural networks

F Cagnetta, M Wyart - arXiv preprint arXiv:2406.00048, 2024 - arxiv.org
How much data is required to learn the structure of a language via next-token prediction?
We study this question for synthetic datasets generated via a Probabilistic Context-Free …

Pruning is Optimal for Learning Sparse Features in High-Dimensions

NM Vural, MA Erdogdu - arXiv preprint arXiv:2406.08658, 2024 - arxiv.org
While it is commonly observed in practice that pruning networks to a certain level of sparsity
can improve the quality of the features, a theoretical explanation of this phenomenon …