Efficient algorithms for learning from coarse labels

D Fotakis, A Kalavasis, V Kontonis… - … on Learning Theory, 2021 - proceedings.mlr.press
For many learning problems one may not have access to fine grained label information; eg,
an image can be labeled as husky, dog, or even animal depending on the expertise of the …

Sgd learns one-layer networks in wgans

Q Lei, J Lee, A Dimakis… - … Conference on Machine …, 2020 - proceedings.mlr.press
Generative adversarial networks (GANs) are a widely used framework for learning
generative models. Wasserstein GANs (WGANs), one of the most successful variants of …

Learning (very) simple generative models is hard

S Chen, J Li, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Motivated by the recent empirical successes of deep generative models, we study the
computational complexity of the following unsupervised learning problem. For an unknown …

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

Learning a 1-layer conditional generative model in total variation

A Jalal, J Kang, A Uppal… - Advances in Neural …, 2024 - proceedings.neurips.cc
A conditional generative model is a method for sampling from a conditional distribution $ p
(y\mid x) $. For example, one may want to sample an image of a cat given the label``cat''. A …

Learning polynomial transformations via generalized tensor decompositions

S Chen, J Li, Y Li, AR Zhang - Proceedings of the 55th Annual ACM …, 2023 - dl.acm.org
We consider the problem of learning high dimensional polynomial transformations of
Gaussians. Given samples of the form f (x), where x∼ N (0, I r) is hidden and f: ℝ r→ ℝ d is a …

Improved linear convergence of training cnns with generalizability guarantees: A one-hidden-layer case

S Zhang, M Wang, J Xiong, S Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
We analyze the learning problem of one-hidden-layer nonoverlapping convolutional neural
networks with the rectified linear unit (ReLU) activation function from the perspective of …

Lower bounds on the total variation distance between mixtures of two gaussians

S Davies, A Mazumdar, S Pal… - International …, 2022 - proceedings.mlr.press
Mixtures of high dimensional Gaussian distributions have been studied extensively in
statistics and learning theory. While the total variation distance appears naturally in the …

Agnostic learning of general relu activation using gradient descent

P Awasthi, A Tang, A Vijayaraghavan - arXiv preprint arXiv:2208.02711, 2022 - arxiv.org
We provide a convergence analysis of gradient descent for the problem of agnostically
learning a single ReLU function under Gaussian distributions. Unlike prior work that studies …

A mathematical framework for learning probability distributions

H Yang - arXiv preprint arXiv:2212.11481, 2022 - arxiv.org
The modeling of probability distributions, specifically generative modeling and density
estimation, has become an immensely popular subject in recent years by virtue of its …