Hidden progress in deep learning: Sgd learns parities near the computational limit

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

On the role of attention in prompt-tuning

S Oymak, AS Rawat, M Soltanolkotabi… - International …, 2023 - proceedings.mlr.press
Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arXiv preprint arXiv …, 2022 - arxiv.org
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

Implicit bias in leaky relu networks trained on high-dimensional data

S Frei, G Vardi, PL Bartlett, N Srebro, W Hu - arXiv preprint arXiv …, 2022 - arxiv.org
The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …

[HTML][HTML] High-performing neural network models of visual cortex benefit from high latent dimensionality

E Elmoznino, MF Bonner - PLOS Computational Biology, 2024 - journals.plos.org
Geometric descriptions of deep neural networks (DNNs) have the potential to uncover core
representational principles of computational models in neuroscience. Here we examined the …

Benign overfitting and grokking in relu networks for xor cluster data

Z Xu, Y Wang, S Frei, G Vardi, W Hu - arXiv preprint arXiv:2310.02541, 2023 - arxiv.org
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …

Understanding the generalization of adam in learning neural networks with proper regularization

D Zou, Y Cao, Y Li, Q Gu - arXiv preprint arXiv:2108.11371, 2021 - arxiv.org
Adaptive gradient methods such as Adam have gained increasing popularity in deep
learning optimization. However, it has been observed that compared with (stochastic) …

Pareto frontiers in deep feature learning: Data, compute, width, and luck

B Edelman, S Goel, S Kakade… - Advances in Neural …, 2024 - proceedings.neurips.cc
In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are
known to modulate nuanced resource tradeoffs. This work investigates how these …