Optimal regularization can mitigate double descent

T Viering, M Loog - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Learning curves provide insight into the dependence of a learner's generalization
performance on the training set size. This important tool can be used for model selection, to …

被引用次数：137 相关文章所有 10 个版本

[PDF] arxiv.org

A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org

The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

被引用次数：85 相关文章所有 2 个版本

[PDF] neurips.cc

Bayesian deep learning and a probabilistic perspective of generalization

AG Wilson, P Izmailov - Advances in neural information …, 2020 - proceedings.neurips.cc

The key distinguishing property of a Bayesian approach is marginalization, rather than using
a single setting of weights. Bayesian marginalization can particularly improve the accuracy …

被引用次数：705 相关文章所有 6 个版本

[PDF] neurips.cc

Towards understanding grokking: An effective theory of representation learning

Z Liu, O Kitouni, NS Nolte, E Michaud… - Advances in …, 2022 - proceedings.neurips.cc

We aim to understand grokking, a phenomenon where models generalize long after
overfitting their training set. We present both a microscopic analysis anchored by an effective …

被引用次数：102 相关文章所有 8 个版本

[PDF] mlr.press

Direct parameterization of lipschitz-bounded deep networks

R Wang, I Manchester - International Conference on …, 2023 - proceedings.mlr.press

This paper introduces a new parameterization of deep neural networks (both fully-connected
and convolutional) with guaranteed $\ell^ 2$ Lipschitz bounds, ie limited sensitivity to input …

被引用次数：32 相关文章所有 6 个版本

[HTML] nature.com Full View

[HTML][HTML] Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks

A Canatar, B Bordelon, C Pehlevan - Nature communications, 2021 - nature.com

A theoretical understanding of generalization remains an open problem for many machine
learning models, including deep networks where overparameterization leads to better …

被引用次数：163 相关文章所有 14 个版本

[PDF] arxiv.org

Learning Curves for Decision Making in Supervised Machine Learning--A Survey

F Mohr, JN van Rijn - arXiv preprint arXiv:2201.12150, 2022 - arxiv.org

Learning curves are a concept from social sciences that has been adopted in the context of
machine learning to assess the performance of a learning algorithm with respect to a certain …

被引用次数：63 相关文章所有 2 个版本

[PDF] mlr.press

Data feedback loops: Model-driven amplification of dataset biases

R Taori, T Hashimoto - International Conference on Machine …, 2023 - proceedings.mlr.press

Datasets scraped from the internet have been critical to large-scale machine learning. Yet,
its success puts the utility of future internet-derived datasets at potential risk, as model …

被引用次数：25 相关文章所有 7 个版本

[PDF] mlr.press

Benign overfitting of constant-stepsize sgd for linear regression

D Zou, J Wu, V Braverman, Q Gu… - … on Learning Theory, 2021 - proceedings.mlr.press

There is an increasing realization that algorithmic inductive biases are central in preventing
overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized …

被引用次数：61 相关文章所有 7 个版本

[PDF] mlr.press

Shape matters: Understanding the implicit bias of the noise covariance

JZ HaoChen, C Wei, J Lee… - Conference on Learning …, 2021 - proceedings.mlr.press

The noise in stochastic gradient descent (SGD) provides a crucial implicit regularization
effect for training overparameterized models. Prior theoretical work largely focuses on …

被引用次数：95 相关文章所有 6 个版本