Random feature amplification: Feature learning and generalization in neural networks

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc

There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

被引用次数：109 相关文章所有 8 个版本

[PDF] neurips.cc

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

被引用次数：110 相关文章所有 9 个版本

[PDF] mlr.press

On the role of attention in prompt-tuning

S Oymak, AS Rawat, M Soltanolkotabi… - International …, 2023 - proceedings.mlr.press

Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …

被引用次数：39 相关文章所有 9 个版本

[PDF] neurips.cc

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arXiv preprint arXiv …, 2022 - arxiv.org

We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

被引用次数：49 相关文章所有 9 个版本

[PDF] arxiv.org

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org

The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

被引用次数：42 相关文章所有 5 个版本

[HTML] plos.org

[HTML][HTML] High-performing neural network models of visual cortex benefit from high latent dimensionality

E Elmoznino, MF Bonner - PLOS Computational Biology, 2024 - journals.plos.org

Geometric descriptions of deep neural networks (DNNs) have the potential to uncover core
representational principles of computational models in neuroscience. Here we examined the …

被引用次数：33 相关文章所有 12 个版本

[PDF] arxiv.org

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org

Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

被引用次数：21 相关文章所有 4 个版本

[PDF] arxiv.org

Understanding the generalization of adam in learning neural networks with proper regularization

D Zou, Y Cao, Y Li, Q Gu - arXiv preprint arXiv:2108.11371, 2021 - arxiv.org

Adaptive gradient methods such as Adam have gained increasing popularity in deep
learning optimization. However, it has been observed that compared with (stochastic) …

被引用次数：40 相关文章所有 9 个版本

[PDF] neurips.cc

Pareto frontiers in deep feature learning: Data, compute, width, and luck

B Edelman, S Goel, S Kakade… - Advances in Neural …, 2024 - proceedings.neurips.cc

In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are
known to modulate nuanced resource tradeoffs. This work investigates how these …

被引用次数：3 相关文章所有 4 个版本