Gradient-based feature learning under structured data

J Ba, MA Erdogdu, T Suzuki… - Advances in Neural …, 2024 - proceedings.neurips.cc

We consider the learning of a single-index target function $ f_*:\mathbb {R}^ d\to\mathbb {R}
$ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …

被引用次数：28 相关文章所有 5 个版本

[PDF] neurips.cc

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc

Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents

Y Dandi, E Troiani, L Arnaboldi, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org

We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Asymptotics of feature learning in two-layer networks after one gradient-step

H Cui, L Pesce, Y Dandi, F Krzakala, YM Lu… - arXiv preprint arXiv …, 2024 - arxiv.org

In this manuscript we investigate the problem of how two-layer neural networks learn
features from data, and improve over the kernel regime, after being trained with a single …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Nonlinear spiked covariance matrices and signal propagation in deep neural networks

Z Wang, D Wu, Z Fan - arXiv preprint arXiv:2402.10127, 2024 - arxiv.org

Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK)
defined by the nonlinear feature map of a feedforward neural network. However, existing …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

L Arnaboldi, Y Dandi, F Krzakala, L Pesce… - arXiv preprint arXiv …, 2024 - arxiv.org

Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arXiv preprint arXiv:2406.11828, 2024 - arxiv.org

We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

被引用次数：6 相关文章

[PDF] arxiv.org

Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

L Arnaboldi, Y Dandi, F Krzakala, B Loureiro… - arXiv preprint arXiv …, 2024 - arxiv.org

We study the impact of the batch size $ n_b $ on the iteration time $ T $ of training two-layer
neural networks with one-pass stochastic gradient descent (SGD) on multi-index target …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arXiv preprint arXiv:2406.01581, 2024 - arxiv.org

We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

被引用次数：10 相关文章所有 3 个版本

[PDF] openreview.net

Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data

A Nitanda, K Oko, T Suzuki, D Wu - The Twelfth International …, 2024 - openreview.net

Recent works have shown that neural networks optimized by gradient-based methods can
adapt to sparse or low-dimensional target functions through feature learning; an often …

被引用次数：3 相关文章