Generalization in kernel regression under realistic assumptions
D Barzilai, O Shamir - arXiv preprint arXiv:2312.15995, 2023 - arxiv.org
It is by now well-established that modern over-parameterized models seem to elude the bias-
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …
The phase diagram of kernel interpolation in large dimensions
The generalization ability of kernel interpolation in large dimensions, ie, 𝑛≍ 𝑑𝛾 for some 𝛾>
0, might be one of the most interesting problems in the recent renaissance of kernel …
0, might be one of the most interesting problems in the recent renaissance of kernel …
How do noise tails impact on deep ReLU networks?
How do noise tails impact on deep ReLU networks? Page 1 The Annals of Statistics 2024,
Vol. 52, No. 4, 1845–1871 https://doi.org/10.1214/24-AOS2428 © Institute of Mathematical …
Vol. 52, No. 4, 1845–1871 https://doi.org/10.1214/24-AOS2428 © Institute of Mathematical …
Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …
On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory
G Chen, Y Li, Q Lin - arXiv preprint arXiv:2410.05626, 2024 - arxiv.org
This paper aims to discuss the impact of random initialization of neural networks in the
neural tangent kernel (NTK) theory, which is ignored by most recent works in the NTK theory …
neural tangent kernel (NTK) theory, which is ignored by most recent works in the NTK theory …
Towards a Statistical Understanding of Neural Networks: Beyond the Neural Tangent Kernel Theories
H Zhang, J Lai, Y Li, Q Lin, JS Liu - arXiv preprint arXiv:2412.18756, 2024 - arxiv.org
A primary advantage of neural networks lies in their feature learning characteristics, which is
challenging to theoretically analyze due to the complexity of their training dynamics. We …
challenging to theoretically analyze due to the complexity of their training dynamics. We …
Benign Overfitting for Regression with Trained Two-Layer ReLU Networks
We study the least-square regression problem with a two-layer fully-connected neural
network, with ReLU activation function, trained by gradient flow. Our first result is a …
network, with ReLU activation function, trained by gradient flow. Our first result is a …
Overfitting Behaviour of Gaussian Kernel Ridgeless Regression: Varying Bandwidth or Dimensionality
We consider the overfitting behavior of minimum norm interpolating solutions of Gaussian
kernel ridge regression (ie kernel ridgeless regression), when the bandwidth or input …
kernel ridge regression (ie kernel ridgeless regression), when the bandwidth or input …
A Comprehensive Analysis on the Learning Curve in Kernel Ridge Regression
This paper conducts a comprehensive study of the learning curves of kernel ridge
regression (KRR) under minimal assumptions. Our contributions are three-fold: 1) we …
regression (KRR) under minimal assumptions. Our contributions are three-fold: 1) we …
Minimum-Norm Interpolation Under Covariate Shift
Transfer learning is a critical part of real-world machine learning deployments and has been
extensively studied in experimental works with overparameterized neural networks …
extensively studied in experimental works with overparameterized neural networks …