Neural networks can learn representations with gradient descent
Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …
by gradient descent behave like kernel methods. However, in practice, it is known that …
What can a single attention layer learn? a study through the random features lens
Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …
building blocks of the Transformer architecture which has achieved significant …
Learning single-index models with shallow neural networks
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …
applied to an unknown one-dimensional projection of the input. These models are …
Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …
Reward-directed conditional diffusion: Provable distribution estimation and reward improvement
We explore the methodology and theory of reward-directed generation via conditional
diffusion models. Directed generation aims to generate samples with desired properties as …
diffusion models. Directed generation aims to generate samples with desired properties as …
Feature learning in deep classifiers through intermediate neural collapse
A Rangamani, M Lindegaard… - International …, 2023 - proceedings.mlr.press
In this paper, we conduct an empirical study of the feature learning process in deep
classifiers. Recent research has identified a training phenomenon called Neural Collapse …
classifiers. Recent research has identified a training phenomenon called Neural Collapse …
Provable guarantees for neural networks via gradient feature learning
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time
Despite recent theoretical progress on the non-convex optimization of two-layer neural
networks, it is still an open question whether gradient descent on neural networks without …
networks, it is still an open question whether gradient descent on neural networks without …
The staircase property: How hierarchical structure can guide deep learning
E Abbe, E Boix-Adsera, MS Brennan… - Advances in …, 2021 - proceedings.neurips.cc
This paper identifies a structural property of data distributions that enables deep neural
networks to learn hierarchically. We define the``staircase''property for functions over the …
networks to learn hierarchically. We define the``staircase''property for functions over the …
Deep equals shallow for ReLU networks in kernel regimes
Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …
approximation. Indeed, certain functions can be approximated by deep networks provably …