Statistical, robustness, and computational guarantees for sliced wasserstein distances
Sliced Wasserstein distances preserve properties of classic Wasserstein distances while
being more scalable for computation and estimation in high dimensions. The goal of this …
being more scalable for computation and estimation in high dimensions. The goal of this …
The gaussian equivalence of generative models for learning with shallow neural networks
Understanding the impact of data structure on the computational tractability of learning is a
key challenge for the theory of neural networks. Many theoretical works do not explicitly …
key challenge for the theory of neural networks. Many theoretical works do not explicitly …
Fast approximation of the sliced-Wasserstein distance using concentration of random projections
Abstract The Sliced-Wasserstein distance (SW) is being increasingly used in machine
learning applications as an alternative to the Wasserstein distance and offers significant …
learning applications as an alternative to the Wasserstein distance and offers significant …
-Sliced Mutual Information: A Quantitative Study of Scalability with Dimension
Z Goldfeld, K Greenewald… - Advances in neural …, 2022 - proceedings.neurips.cc
Sliced mutual information (SMI) is defined as an average of mutual information (MI) terms
between one-dimensional random projections of the random variables. It serves as a …
between one-dimensional random projections of the random variables. It serves as a …
The replica-symmetric prediction for random linear estimation with Gaussian matrices is exact
G Reeves, HD Pfister - IEEE Transactions on Information …, 2019 - ieeexplore.ieee.org
This paper considers the fundamental limit of random linear estimation for iid signal
distributions and iid Gaussian measurement matrices. Its main contribution is a rigorous …
distributions and iid Gaussian measurement matrices. Its main contribution is a rigorous …
The all-or-nothing phenomenon in sparse linear regression
We study the problem of recovering a hidden binary $ k $-sparse $ p $-dimensional vector
$\beta $ from $ n $ noisy linear observations $ Y= X\beta+ W $ where $ X_ {ij} $ are iid …
$\beta $ from $ n $ noisy linear observations $ Y= X\beta+ W $ where $ X_ {ij} $ are iid …
Sliced-Wasserstein distance for large-scale machine learning: theory, methodology and extensions
K Nadjahi - 2021 - theses.hal.science
Many methods for statistical inference and generative modeling rely on a probability
divergence to effectively compare two probability distributions. The Wasserstein distance …
divergence to effectively compare two probability distributions. The Wasserstein distance …
Additivity of information in multilayer networks via additive Gaussian noise transforms
G Reeves - 2017 55th Annual Allerton Conference on …, 2017 - ieeexplore.ieee.org
Multilayer (or deep) networks are powerful probabilistic models based on multiple stages of
a linear transform followed by a non-linear (possibly random) function. In general, the linear …
a linear transform followed by a non-linear (possibly random) function. In general, the linear …
Fundamental limits of overparametrized shallow neural networks for supervised learning
We carry out an information-theoretical analysis of a two-layer neural network trained from
input-output pairs generated by a teacher network with matching architecture, in …
input-output pairs generated by a teacher network with matching architecture, in …
Mutual information as a function of matrix snr for linear gaussian channels
This paper focuses on the mutual information and minimum mean-squared error (MMSE) as
a function a matrix-valued signal-to-noise ratio (SNR) for a linear Gaussian channel with …
a function a matrix-valued signal-to-noise ratio (SNR) for a linear Gaussian channel with …