Quantum variational algorithms are swamped with traps
ER Anschuetz, BT Kiani - Nature Communications, 2022 - nature.com
One of the most important properties of classical neural networks is how surprisingly
trainable they are, though their training algorithms typically rely on optimizing complicated …
trainable they are, though their training algorithms typically rely on optimizing complicated …
Smoothing the landscape boosts the signal for sgd: Optimal sample complexity for learning single index models
We focus on the task of learning a single index model $\sigma (w^\star\cdot x) $ with respect
to the isotropic Gaussian distribution in $ d $ dimensions. Prior work has shown that the …
to the isotropic Gaussian distribution in $ d $ dimensions. Prior work has shown that the …
Machine un-learning: an overview of techniques, applications, and future directions
ML applications proliferate across various sectors. Large internet firms employ ML to train
intelligent models using vast datasets, including sensitive user information. However, new …
intelligent models using vast datasets, including sensitive user information. However, new …
Statistical algorithms and a lower bound for detecting planted cliques
We introduce a framework for proving lower bounds on computational problems over
distributions against algorithms that can be implemented using access to a statistical query …
distributions against algorithms that can be implemented using access to a statistical query …
Superpolynomial lower bounds for learning one-layer neural networks using gradient descent
We give the first superpolynomial lower bounds for learning one-layer neural networks with
respect to the Gaussian distribution for a broad class of algorithms. In the regression setting …
respect to the Gaussian distribution for a broad class of algorithms. In the regression setting …
Near-optimal sq lower bounds for agnostically learning halfspaces and relus under gaussian marginals
We study the fundamental problems of agnostically learning halfspaces and ReLUs under
Gaussian marginals. In the former problem, given labeled examples $(\bx, y) $ from an …
Gaussian marginals. In the former problem, given labeled examples $(\bx, y) $ from an …
The optimality of polynomial regression for agnostic learning under gaussian marginals in the SQ model
We study the problem of agnostic learning under the Gaussian distribution in the Statistical
Query (SQ) model. We develop a method for finding hard families of examples for a wide …
Query (SQ) model. We develop a method for finding hard families of examples for a wide …
Algorithms and sq lower bounds for pac learning one-hidden-layer relu networks
I Diakonikolas, DM Kane… - … on Learning Theory, 2020 - proceedings.mlr.press
We study the problem of PAC learning one-hidden-layer ReLU networks with $ k $ hidden
units on $\mathbb {R}^ d $ under Gaussian marginals in the presence of additive label …
units on $\mathbb {R}^ d $ under Gaussian marginals in the presence of additive label …
Time/accuracy tradeoffs for learning a relu with respect to gaussian marginals
We consider the problem of computing the best-fitting ReLU with respect to square-loss on a
training set when the examples have been drawn according to a spherical Gaussian …
training set when the examples have been drawn according to a spherical Gaussian …
Provably learning a multi-head attention layer
S Chen, Y Li - arXiv preprint arXiv:2402.04084, 2024 - arxiv.org
The multi-head attention layer is one of the key components of the transformer architecture
that sets it apart from traditional feed-forward models. Given a sequence length $ k …
that sets it apart from traditional feed-forward models. Given a sequence length $ k …