Learning in the presence of low-dimensional structure: a spiked random matrix perspective
J Ba, MA Erdogdu, T Suzuki… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider the learning of a single-index target function $ f_*:\mathbb {R}^ d\to\mathbb {R}
$ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …
$ under spiked covariance data: $$ f_*(\boldsymbol {x})=\textstyle\sigma_*(\frac {1}{\sqrt …
Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents
We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …
Asymptotics of feature learning in two-layer networks after one gradient-step
In this manuscript we investigate the problem of how two-layer neural networks learn
features from data, and improve over the kernel regime, after being trained with a single …
features from data, and improve over the kernel regime, after being trained with a single …
Nonlinear spiked covariance matrices and signal propagation in deep neural networks
Many recent works have studied the eigenvalue spectrum of the Conjugate Kernel (CK)
defined by the nonlinear feature map of a feedforward neural network. However, existing …
defined by the nonlinear feature map of a feedforward neural network. However, existing …
Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions
Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …
Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs
We study the impact of the batch size $ n_b $ on the iteration time $ T $ of training two-layer
neural networks with one-pass stochastic gradient descent (SGD) on multi-index target …
neural networks with one-pass stochastic gradient descent (SGD) on multi-index target …
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
Improved statistical and computational complexity of the mean-field Langevin dynamics under structured data
Recent works have shown that neural networks optimized by gradient-based methods can
adapt to sparse or low-dimensional target functions through feature learning; an often …
adapt to sparse or low-dimensional target functions through feature learning; an often …