Explaining neural scaling laws

Y Bahri, E Dyer, J Kaplan, J Lee, U Sharma - Proceedings of the National …, 2024 - pnas.org
The population loss of trained deep neural networks often follows precise power-law scaling
relations with either the size of the training dataset or the number of parameters in the …

Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting

N Mallinar, J Simon, A Abedsoltan… - Advances in …, 2022 - proceedings.neurips.cc
The practical success of overparameterized neural networks has motivated the recent
scientific study of\emph {interpolating methods}--learning methods which are able fit their …

More than a toy: Random matrix models predict how real-world neural representations generalize

A Wei, W Hu, J Steinhardt - International Conference on …, 2022 - proceedings.mlr.press
Of theories for why large-scale machine learning models generalize despite being vastly
overparameterized, which of their assumptions are needed to capture the qualitative …

Deterministic equivalent and error universality of deep random features learning

D Schröder, H Cui, D Dmitriev… - … on Machine Learning, 2023 - proceedings.mlr.press
This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …

Bayes-optimal learning of deep random networks of extensive-width

H Cui, F Krzakala, L Zdeborová - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of learning a target function corresponding to a deep, extensive-
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …

How two-layer neural networks learn, one (giant) step at a time

Y Dandi, F Krzakala, B Loureiro, L Pesce… - arXiv preprint arXiv …, 2023 - arxiv.org
We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …

On the asymptotic learning curves of kernel ridge regression under power-law decay

Y Li, Q Lin - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
The widely observed'benign overfitting phenomenon'in the neural network literature raises
the challenge to thebias-variance trade-off'doctrine in the statistical learning theory. Since …

Are Gaussian data all you need? The extents and limits of universality in high-dimensional generalized linear estimation

L Pesce, F Krzakala, B Loureiro… - … on Machine Learning, 2023 - proceedings.mlr.press
In this manuscript we consider the problem of generalized linear estimation on Gaussian
mixture data with labels given by a single-index model. Our first result is a sharp asymptotic …

On the optimality of misspecified kernel ridge regression

H Zhang, Y Li, W Lu, Q Lin - International Conference on …, 2023 - proceedings.mlr.press
In the misspecified kernel ridge regression problem, researchers usually assume the
underground true function $ f_ {\rho}^{\star}\in [\mathcal {H}]^{s} $, a less-smooth …

Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression

T Misiakiewicz - arXiv preprint arXiv:2204.10425, 2022 - arxiv.org
We study the spectrum of inner-product kernel matrices, ie, $ n\times n $ matrices with
entries $ h (\langle\textbf {x} _i,\textbf {x} _j\rangle/d) $ where the $(\textbf {x} _i) _ {i\leq n} …