Kernel and rich regimes in overparametrized models

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

被引用次数：72 相关文章所有 5 个版本

[PDF] arxiv.org

A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arXiv preprint arXiv:2109.02355, 2021 - arxiv.org

The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

被引用次数：85 相关文章所有 2 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3634 相关文章所有 2 个版本

[PDF] mlr.press

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

被引用次数：121 相关文章所有 4 个版本

[PDF] neurips.cc

Reconstructing training data from trained neural networks

N Haim, G Vardi, G Yehudai… - Advances in Neural …, 2022 - proceedings.neurips.cc

Understanding to what extent neural networks memorize training data is an intriguing
question with practical and theoretical implications. In this paper we show that in some …

被引用次数：128 相关文章所有 6 个版本

[PDF] mlr.press

Generalized federated learning via sharpness aware minimization

Z Qu, X Li, R Duan, Y Liu, B Tang… - … conference on machine …, 2022 - proceedings.mlr.press

Federated Learning (FL) is a promising framework for performing privacy-preserving,
distributed learning with a set of clients. However, the data distribution among clients often …

被引用次数：100 相关文章所有 7 个版本

[HTML] cell.com Full View

[HTML][HTML] Orthogonal representations for robust context-dependent task performance in brains and neural networks

T Flesch, K Juechems, T Dumbalska, A Saxe… - Neuron, 2022 - cell.com

How do neural populations code for multiple, potentially conflicting tasks? Here we used
computational simulations involving neural networks to define" lazy" and" rich" coding …

被引用次数：155 相关文章所有 13 个版本

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

被引用次数：99 相关文章所有 7 个版本

[PDF] neurips.cc

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

被引用次数：106 相关文章所有 9 个版本

[PDF] arxiv.org

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org

In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

被引用次数：221 相关文章所有 6 个版本