Neural networks can learn representations with gradient descent

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press
Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

Reward-directed conditional diffusion: Provable distribution estimation and reward improvement

H Yuan, K Huang, C Ni, M Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
We explore the methodology and theory of reward-directed generation via conditional
diffusion models. Directed generation aims to generate samples with desired properties as …

Feature learning in deep classifiers through intermediate neural collapse

A Rangamani, M Lindegaard… - International …, 2023 - proceedings.mlr.press
In this paper, we conduct an empirical study of the feature learning process in deep
classifiers. Recent research has identified a training phenomenon called Neural Collapse …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time

A Mahankali, H Zhang, K Dong… - Advances in Neural …, 2023 - proceedings.neurips.cc
Despite recent theoretical progress on the non-convex optimization of two-layer neural
networks, it is still an open question whether gradient descent on neural networks without …

The staircase property: How hierarchical structure can guide deep learning

E Abbe, E Boix-Adsera, MS Brennan… - Advances in …, 2021 - proceedings.neurips.cc
This paper identifies a structural property of data distributions that enables deep neural
networks to learn hierarchically. We define the``staircase''property for functions over the …

Deep equals shallow for ReLU networks in kernel regimes

A Bietti, F Bach - arXiv preprint arXiv:2009.14397, 2020 - arxiv.org
Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …