Towards understanding hierarchical learning: Benefits of neural representations

A Damian, J Lee… - Conference on Learning …, 2022 - proceedings.mlr.press

Significant theoretical work has established that in specific regimes, neural networks trained
by gradient descent behave like kernel methods. However, in practice, it is known that …

被引用次数：132 相关文章所有 6 个版本

[PDF] neurips.cc

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

被引用次数：24 相关文章所有 6 个版本

[PDF] neurips.cc

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc

Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

被引用次数：88 相关文章所有 12 个版本

[PDF] mlr.press

Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

被引用次数：83 相关文章所有 4 个版本

[PDF] neurips.cc

Reward-directed conditional diffusion: Provable distribution estimation and reward improvement

H Yuan, K Huang, C Ni, M Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

We explore the methodology and theory of reward-directed generation via conditional
diffusion models. Directed generation aims to generate samples with desired properties as …

被引用次数：31 相关文章所有 7 个版本

[PDF] mlr.press

Feature learning in deep classifiers through intermediate neural collapse

A Rangamani, M Lindegaard… - International …, 2023 - proceedings.mlr.press

In this paper, we conduct an empirical study of the feature learning process in deep
classifiers. Recent research has identified a training phenomenon called Neural Collapse …

被引用次数：40 相关文章所有 7 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

Beyond ntk with vanilla gradient descent: A mean-field analysis of neural networks with polynomial width, samples, and time

A Mahankali, H Zhang, K Dong… - Advances in Neural …, 2023 - proceedings.neurips.cc

Despite recent theoretical progress on the non-convex optimization of two-layer neural
networks, it is still an open question whether gradient descent on neural networks without …

被引用次数：12 相关文章所有 6 个版本

[PDF] neurips.cc

The staircase property: How hierarchical structure can guide deep learning

E Abbe, E Boix-Adsera, MS Brennan… - Advances in …, 2021 - proceedings.neurips.cc

This paper identifies a structural property of data distributions that enables deep neural
networks to learn hierarchically. We define the``staircase''property for functions over the …

被引用次数：64 相关文章所有 9 个版本

[PDF] arxiv.org

Deep equals shallow for ReLU networks in kernel regimes

A Bietti, F Bach - arXiv preprint arXiv:2009.14397, 2020 - arxiv.org

Deep networks are often considered to be more expressive than shallow ones in terms of
approximation. Indeed, certain functions can be approximated by deep networks provably …

被引用次数：92 相关文章所有 11 个版本