The computational complexity of learning gaussian single-index models
Single-Index Models are high-dimensional regression problems with planted structure,
whereby labels depend on an unknown one-dimensional projection of the input via a …
whereby labels depend on an unknown one-dimensional projection of the input via a …
Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions
Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
A spring-block theory of feature learning in deep neural networks
C Shi, L Pan, I Dokmanić - arXiv preprint arXiv:2407.19353, 2024 - arxiv.org
A central question in deep learning is how deep neural networks (DNNs) learn features.
DNN layers progressively collapse data into a regular low-dimensional geometry. This …
DNN layers progressively collapse data into a regular low-dimensional geometry. This …
Sliding down the stairs: how correlated latent variables accelerate learning with neural networks
Neural networks extract features from data using stochastic gradient descent (SGD). In
particular, higher-order input cumulants (HOCs) are crucial for their performance. However …
particular, higher-order input cumulants (HOCs) are crucial for their performance. However …
On the Complexity of Learning Sparse Functions with Statistical and Gradient Queries
The goal of this paper is to investigate the complexity of gradient algorithms when learning
sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf {SQ} $), which …
sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf {SQ} $), which …
Learning Multi-Index Models with Neural Networks via Mean-Field Langevin Dynamics
We study the problem of learning multi-index models in high-dimensions using a two-layer
neural network trained with the mean-field Langevin algorithm. Under mild distributional …
neural network trained with the mean-field Langevin algorithm. Under mild distributional …
Towards a theory of how the structure of language is acquired by deep neural networks
F Cagnetta, M Wyart - arXiv preprint arXiv:2406.00048, 2024 - arxiv.org
How much data is required to learn the structure of a language via next-token prediction?
We study this question for synthetic datasets generated via a Probabilistic Context-Free …
We study this question for synthetic datasets generated via a Probabilistic Context-Free …
Pruning is Optimal for Learning Sparse Features in High-Dimensions
NM Vural, MA Erdogdu - arXiv preprint arXiv:2406.08658, 2024 - arxiv.org
While it is commonly observed in practice that pruning networks to a certain level of sparsity
can improve the quality of the features, a theoretical explanation of this phenomenon …
can improve the quality of the features, a theoretical explanation of this phenomenon …