A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org
The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov
Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

Benign overfitting in two-layer convolutional neural networks

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc
Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

Benign overfitting without linearity: Neural network classifiers trained by gradient descent for noisy linear data

S Frei, NS Chatterji, P Bartlett - Conference on Learning …, 2022 - proceedings.mlr.press
Benign overfitting, the phenomenon where interpolating models generalize well in the
presence of noisy data, was first observed in neural network models trained with gradient …

Classification vs regression in overparameterized regimes: Does the loss function matter?

V Muthukumar, A Narang, V Subramanian… - Journal of Machine …, 2021 - jmlr.org
We compare classification and regression tasks in an overparameterized linear model with
Gaussian features. On the one hand, we show that with sufficient overparameterization all …

Benign overfitting in two-layer ReLU convolutional neural networks

Y Kou, Z Chen, Y Chen, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …

Characterizing datapoints via second-split forgetting

P Maini, S Garg, Z Lipton… - Advances in Neural …, 2022 - proceedings.neurips.cc
Researchers investigating example hardness have increasingly focused on the dynamics by
which neural networks learn and forget examples throughout training. Popular metrics …

Implicit bias of gradient descent for two-layer reLU and leaky reLU networks on nearly-orthogonal data

Y Kou, Z Chen, Q Gu - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …

A u-turn on double descent: Rethinking parameter counting in statistical learning

A Curth, A Jeffares… - Advances in Neural …, 2024 - proceedings.neurips.cc
Conventional statistical wisdom established a well-understood relationship between model
complexity and prediction error, typically presented as a _U-shaped curve_ reflecting a …

Interpreting cis-regulatory mechanisms from genomic deep neural networks using surrogate models

EE Seitz, DM McCandlish, JB Kinney… - Nature Machine …, 2024 - nature.com
Deep neural networks (DNNs) have greatly advanced the ability to predict genome function
from sequence. However, elucidating underlying biological mechanisms from genomic …