Explaining neural scaling laws
The population loss of trained deep neural networks often follows precise power-law scaling
relations with either the size of the training dataset or the number of parameters in the …
relations with either the size of the training dataset or the number of parameters in the …
Benign, tempered, or catastrophic: Toward a refined taxonomy of overfitting
The practical success of overparameterized neural networks has motivated the recent
scientific study of\emph {interpolating methods}--learning methods which are able fit their …
scientific study of\emph {interpolating methods}--learning methods which are able fit their …
More than a toy: Random matrix models predict how real-world neural representations generalize
Of theories for why large-scale machine learning models generalize despite being vastly
overparameterized, which of their assumptions are needed to capture the qualitative …
overparameterized, which of their assumptions are needed to capture the qualitative …
Deterministic equivalent and error universality of deep random features learning
This manuscript considers the problem of learning a random Gaussian network function
using a fully connected network with frozen intermediate layers and trainable readout layer …
using a fully connected network with frozen intermediate layers and trainable readout layer …
Bayes-optimal learning of deep random networks of extensive-width
We consider the problem of learning a target function corresponding to a deep, extensive-
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
width, non-linear neural network with random Gaussian weights. We consider the asymptotic …
How two-layer neural networks learn, one (giant) step at a time
We investigate theoretically how the features of a two-layer neural network adapt to the
structure of the target function through a few large batch gradient descent steps, leading to …
structure of the target function through a few large batch gradient descent steps, leading to …
On the asymptotic learning curves of kernel ridge regression under power-law decay
The widely observed'benign overfitting phenomenon'in the neural network literature raises
the challenge to thebias-variance trade-off'doctrine in the statistical learning theory. Since …
the challenge to thebias-variance trade-off'doctrine in the statistical learning theory. Since …
Are Gaussian data all you need? The extents and limits of universality in high-dimensional generalized linear estimation
In this manuscript we consider the problem of generalized linear estimation on Gaussian
mixture data with labels given by a single-index model. Our first result is a sharp asymptotic …
mixture data with labels given by a single-index model. Our first result is a sharp asymptotic …
On the optimality of misspecified kernel ridge regression
In the misspecified kernel ridge regression problem, researchers usually assume the
underground true function $ f_ {\rho}^{\star}\in [\mathcal {H}]^{s} $, a less-smooth …
underground true function $ f_ {\rho}^{\star}\in [\mathcal {H}]^{s} $, a less-smooth …
Spectrum of inner-product kernel matrices in the polynomial regime and multiple descent phenomenon in kernel ridge regression
T Misiakiewicz - arXiv preprint arXiv:2204.10425, 2022 - arxiv.org
We study the spectrum of inner-product kernel matrices, ie, $ n\times n $ matrices with
entries $ h (\langle\textbf {x} _i,\textbf {x} _j\rangle/d) $ where the $(\textbf {x} _i) _ {i\leq n} …
entries $ h (\langle\textbf {x} _i,\textbf {x} _j\rangle/d) $ where the $(\textbf {x} _i) _ {i\leq n} …