Statistical mechanics of deep learning

Y Bahri, J Kadmon, J Pennington… - Annual Review of …, 2020 - annualreviews.org
The recent striking success of deep neural networks in machine learning raises profound
questions about the theoretical principles underlying their success. For example, what can …

Overview frequency principle/spectral bias in deep learning

ZQJ Xu, Y Zhang, T Luo - Communications on Applied Mathematics and …, 2024 - Springer
Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

Wide neural networks of any depth evolve as linear models under gradient descent

J Lee, L Xiao, S Schoenholz, Y Bahri… - Advances in neural …, 2019 - proceedings.neurips.cc
A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

C Fang, H He, Q Long, WJ Su - Proceedings of the National …, 2021 - National Acad Sciences
In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …

Neural collapse under mse loss: Proximity to and dynamics on the central path

XY Han, V Papyan, DL Donoho - arXiv preprint arXiv:2106.02073, 2021 - arxiv.org
The recently discovered Neural Collapse (NC) phenomenon occurs pervasively in today's
deep net training paradigm of driving cross-entropy (CE) loss towards zero. During NC, last …

Modeling the influence of data structure on learning in neural networks: The hidden manifold model

S Goldt, M Mézard, F Krzakala, L Zdeborová - Physical Review X, 2020 - APS
Understanding the reasons for the success of deep neural networks trained using stochastic
gradient-based methods is a key open problem for the nascent theory of deep learning. The …

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …

Finite depth and width corrections to the neural tangent kernel

B Hanin, M Nica - arXiv preprint arXiv:1909.05989, 2019 - arxiv.org
We prove the precise scaling, at finite depth and width, for the mean and variance of the
neural tangent kernel (NTK) in a randomly initialized ReLU network. The standard deviation …

The gaussian equivalence of generative models for learning with shallow neural networks

S Goldt, B Loureiro, G Reeves… - Mathematical and …, 2022 - proceedings.mlr.press
Understanding the impact of data structure on the computational tractability of learning is a
key challenge for the theory of neural networks. Many theoretical works do not explicitly …