Hidden progress in deep learning: Sgd learns parities near the computational limit
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …
methods as we scale up datasets, model sizes, and training times. While there are some …
High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
On the role of attention in prompt-tuning
Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …
Provable guarantees for neural networks via gradient feature learning
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
Neural networks efficiently learn low-dimensional representations with sgd
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
Implicit bias in leaky relu networks trained on high-dimensional data
The implicit biases of gradient-based optimization algorithms are conjectured to be a major
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …
factor in the success of modern deep learning. In this work, we investigate the implicit bias of …
[HTML][HTML] High-performing neural network models of visual cortex benefit from high latent dimensionality
E Elmoznino, MF Bonner - PLOS Computational Biology, 2024 - journals.plos.org
Geometric descriptions of deep neural networks (DNNs) have the potential to uncover core
representational principles of computational models in neuroscience. Here we examined the …
representational principles of computational models in neuroscience. Here we examined the …
Benign overfitting and grokking in relu networks for xor cluster data
Neural networks trained by gradient descent (GD) have exhibited a number of surprising
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …
generalization behaviors. First, they can achieve a perfect fit to noisy training data and still …
Understanding the generalization of adam in learning neural networks with proper regularization
Adaptive gradient methods such as Adam have gained increasing popularity in deep
learning optimization. However, it has been observed that compared with (stochastic) …
learning optimization. However, it has been observed that compared with (stochastic) …
Pareto frontiers in deep feature learning: Data, compute, width, and luck
In modern deep learning, algorithmic choices (such as width, depth, and learning rate) are
known to modulate nuanced resource tradeoffs. This work investigates how these …
known to modulate nuanced resource tradeoffs. This work investigates how these …