Hitting the high-dimensional notes: An ode for sgd learning dynamics on glms and multi-index models

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

被引用次数：19 相关文章所有 8 个版本

[PDF] arxiv.org

Bayes-optimal learning of an extensive-width neural network from quadratically many samples

A Maillard, E Troiani, S Martin, F Krzakala… - arXiv preprint arXiv …, 2024 - arxiv.org

We consider the problem of learning a target function corresponding to a single hidden layer
neural network, with a quadratic activation function after the first layer, and random weights …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

L Arnaboldi, Y Dandi, F Krzakala, B Loureiro… - arXiv preprint arXiv …, 2024 - arxiv.org

We study the impact of the batch size $ n_b $ on the iteration time $ T $ of training two-layer
neural networks with one-pass stochastic gradient descent (SGD) on multi-index target …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arXiv preprint arXiv:2406.01581, 2024 - arxiv.org

We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Learning multi-index models with neural networks via mean-field langevin dynamics

A Mousavi-Hosseini, D Wu, MA Erdogdu - arXiv preprint arXiv:2408.07254, 2024 - arxiv.org

We study the problem of learning multi-index models in high-dimensions using a two-layer
neural network trained with the mean-field Langevin algorithm. Under mild distributional …

被引用次数：1 相关文章

[PDF] arxiv.org

SGD with memory: fundamental properties and stochastic acceleration

D Yarotsky, M Velikanov - arXiv preprint arXiv:2410.04228, 2024 - arxiv.org

An important open problem is the theoretically feasible acceleration of mini-batch SGD-type
algorithms on quadratic problems with power-law spectrum. In the non-stochastic setting, the …

The High Line: Exact Risk and Learning Rate Curves of Stochastic Adaptive Learning Rate Algorithms

E Collins-Woodfin, I Seroussi… - arXiv preprint arXiv …, 2024 - arxiv.org

We develop a framework for analyzing the training and learning rate dynamics on a large
class of high-dimensional optimization problems, which we call the high line, trained using …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

A Agarwala, J Pennington - arXiv preprint arXiv:2404.19261, 2024 - arxiv.org

Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues
of the training loss Hessian have some remarkably robust features across models and …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Gradient descent inference in empirical risk minimization

Q Han, X Xu - arXiv preprint arXiv:2412.09498, 2024 - arxiv.org

Gradient descent is one of the most widely used iterative algorithms in modern statistical
learning. However, its precise algorithmic dynamics in high-dimensional settings remain …

A non-asymptotic theory of Kernel Ridge Regression: deterministic equivalents, test error, and GCV estimator

T Misiakiewicz, B Saeed - arXiv preprint arXiv:2403.08938, 2024 - arxiv.org

We consider learning an unknown target function $ f_* $ using kernel ridge regression
(KRR) given iid data $(u_i, y_i) $, $ i\leq n $, where $ u_i\in U $ is a covariate vector and …

被引用次数：9 相关文章所有 2 个版本