Parameters as interacting particles: long time convergence and asymptotic error scaling of...

Y Bahri, J Kadmon, J Pennington… - Annual Review of …, 2020 - annualreviews.org

The recent striking success of deep neural networks in machine learning raises profound
questions about the theoretical principles underlying their success. For example, what can …

被引用次数：280 相关文章所有 7 个版本

[PDF] arxiv.org

Overview frequency principle/spectral bias in deep learning

ZQJ Xu, Y Zhang, T Luo - Communications on Applied Mathematics and …, 2024 - Springer

Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …

被引用次数：66 相关文章所有 3 个版本

[PDF] cambridge.org

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

被引用次数：367 相关文章所有 12 个版本

[PDF] neurips.cc

Wide neural networks of any depth evolve as linear models under gradient descent

J Lee, L Xiao, S Schoenholz, Y Bahri… - Advances in neural …, 2019 - proceedings.neurips.cc

A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …

被引用次数：1174 相关文章所有 13 个版本

[PDF] pnas.org Full View

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

C Fang, H He, Q Long, WJ Su - Proceedings of the National …, 2021 - National Acad Sciences

In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …

被引用次数：177 相关文章所有 10 个版本

[PDF] arxiv.org

Neural collapse under mse loss: Proximity to and dynamics on the central path

XY Han, V Papyan, DL Donoho - arXiv preprint arXiv:2106.02073, 2021 - arxiv.org

The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's
deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last …

被引用次数：152 相关文章所有 3 个版本

[PDF] aps.org

Modeling the influence of data structure on learning in neural networks: The hidden manifold model

S Goldt, M Mézard, F Krzakala, L Zdeborová - Physical Review X, 2020 - APS

Understanding the reasons for the success of deep neural networks trained using stochastic
gradient-based methods is a key open problem for the nascent theory of deep learning. The …

被引用次数：197 相关文章所有 17 个版本

[PDF] neurips.cc

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …

被引用次数：30 相关文章所有 8 个版本

[PDF] arxiv.org

Finite depth and width corrections to the neural tangent kernel

B Hanin, M Nica - arXiv preprint arXiv:1909.05989, 2019 - arxiv.org

We prove the precise scaling, at finite depth and width, for the mean and variance of the
neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation …

被引用次数：174 相关文章所有 6 个版本

[PDF] mlr.press

The gaussian equivalence of generative models for learning with shallow neural networks

S Goldt, B Loureiro, G Reeves… - Mathematical and …, 2022 - proceedings.mlr.press

Understanding the impact of data structure on the computational tractability of learning is a
key challenge for the theory of neural networks. Many theoretical works do not explicitly …

被引用次数：126 相关文章所有 8 个版本