Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime

Y Bahri, E Dyer, J Kaplan, J Lee, U Sharma - Proceedings of the National …, 2024 - pnas.org

The population loss of trained deep neural networks often follows precise power-law scaling
relations with either the size of the training dataset or the number of parameters in the …

被引用次数：167 相关文章所有 5 个版本

[PDF] neurips.cc

Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting

N Mallinar, J Simon, A Abedsoltan… - Advances in …, 2022 - proceedings.neurips.cc

The practical success of overparameterized neural networks has motivated the recent
scientific study of\emph {interpolating methods}--learning methods which are able fit their …

被引用次数：51 相关文章所有 7 个版本

[PDF] mlr.press

More than a toy: Random matrix models predict how real-world neural representations generalize

A Wei, W Hu, J Steinhardt - International Conference on …, 2022 - proceedings.mlr.press

Of theories for why large-scale machine learning models generalize despite being vastly
overparameterized, which of their assumptions are needed to capture the qualitative …

被引用次数：56 相关文章所有 8 个版本

[PDF] mlr.press

Deterministic equivalent and error universality of deep random features learning

D Schröder, H Cui, D Dmitriev… - … on Machine Learning, 2023 - proceedings.mlr.press

This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …

被引用次数：22 相关文章所有 11 个版本

[PDF] mlr.press

Bayes-optimal learning of deep random networks of extensive-width

H Cui, F Krzakala, L Zdeborová - … Conference on Machine …, 2023 - proceedings.mlr.press

We consider the problem of learning a target function corresponding to a deep, extensive-
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …

被引用次数：25 相关文章所有 8 个版本

[PDF] arxiv.org

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org

We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

被引用次数：28 相关文章所有 3 个版本

[PDF] neurips.cc

On the asymptotic learning curves of kernel ridge regression under power-law decay

Y Li, Q Lin - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

The widely observed'benign overfitting phenomenon'in the neural network literature raises
the challenge to thebias-variance trade-off'doctrine in the statistical learning theory. Since …

被引用次数：8 相关文章所有 5 个版本

[PDF] mlr.press

Are Gaussian data all you need? The extents and limits of universality in high-dimensional generalized linear estimation

L Pesce, F Krzakala, B Loureiro… - … on Machine Learning, 2023 - proceedings.mlr.press

In this manuscript we consider the problem of generalized linear estimation on Gaussian
mixture data with labels given by a single-index model. Our first result is a sharp asymptotic …

被引用次数：16 相关文章所有 8 个版本

[PDF] mlr.press

On the optimality of misspecified kernel ridge regression

H Zhang, Y Li, W Lu, Q Lin - International Conference on …, 2023 - proceedings.mlr.press

In the misspecified kernel ridge regression problem, researchers usually assume the
underground true function $ f_ {\rho}^{\star}\in [\mathcal {H}]^{s} $, a less-smooth …

被引用次数：14 相关文章所有 7 个版本

[PDF] arxiv.org

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

T Misiakiewicz - arXiv preprint arXiv:2204.10425, 2022 - arxiv.org

We study the spectrum of inner-product kernel matrices, ie, $ n\times n $ matrices with
entries $ h (\langle\textbf {x} _i,\textbf {x} _j\rangle/d) $ where the $(\textbf {x} _i) _ {i\leq n} …

被引用次数：36 相关文章所有 2 个版本