Statistical, robustness, and computational guarantees for sliced wasserstein distances

S Nietert, Z Goldfeld, R Sadhu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Sliced Wasserstein distances preserve properties of classic Wasserstein distances while
being more scalable for computation and estimation in high dimensions. The goal of this …

The gaussian equivalence of generative models for learning with shallow neural networks

S Goldt, B Loureiro, G Reeves… - Mathematical and …, 2022 - proceedings.mlr.press
Understanding the impact of data structure on the computational tractability of learning is a
key challenge for the theory of neural networks. Many theoretical works do not explicitly …

Fast approximation of the sliced-Wasserstein distance using concentration of random projections

K Nadjahi, A Durmus, PE Jacob… - Advances in …, 2021 - proceedings.neurips.cc
Abstract The Sliced-Wasserstein distance (SW) is being increasingly used in machine
learning applications as an alternative to the Wasserstein distance and offers significant …

-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension

Z Goldfeld, K Greenewald… - Advances in neural …, 2022 - proceedings.neurips.cc
Sliced mutual information (SMI) is defined as an average of mutual information (MI) terms
between one-dimensional random projections of the random variables. It serves as a …

The replica-symmetric prediction for random linear estimation with Gaussian matrices is exact

G Reeves, HD Pfister - IEEE Transactions on Information …, 2019 - ieeexplore.ieee.org
This paper considers the fundamental limit of random linear estimation for iid signal
distributions and iid Gaussian measurement matrices. Its main contribution is a rigorous …

The all-or-nothing phenomenon in sparse linear regression

G Reeves, J Xu, I Zadik - Conference on Learning Theory, 2019 - proceedings.mlr.press
We study the problem of recovering a hidden binary $ k $-sparse $ p $-dimensional vector
$\beta $ from $ n $ noisy linear observations $ Y= X\beta+ W $ where $ X_ {ij} $ are iid …

Sliced-Wasserstein distance for large-scale machine learning: theory, methodology and extensions

K Nadjahi - 2021 - theses.hal.science
Many methods for statistical inference and generative modeling rely on a probability
divergence to effectively compare two probability distributions. The Wasserstein distance …

Additivity of information in multilayer networks via additive Gaussian noise transforms

G Reeves - 2017 55th Annual Allerton Conference on …, 2017 - ieeexplore.ieee.org
Multilayer (or deep) networks are powerful probabilistic models based on multiple stages of
a linear transform followed by a non-linear (possibly random) function. In general, the linear …

Fundamental limits of overparametrized shallow neural networks for supervised learning

F Camilli, D Tieplova, J Barbier - arXiv preprint arXiv:2307.05635, 2023 - arxiv.org
We carry out an information-theoretical analysis of a two-layer neural network trained from
input-output pairs generated by a teacher network with matching architecture, in …

Mutual information as a function of matrix snr for linear gaussian channels

G Reeves, HD Pfister, A Dytso - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
This paper focuses on the mutual information and minimum mean-squared error (MMSE) as
a function a matrix-valued signal-to-noise ratio (SNR) for a linear Gaussian channel with …