Machine learning for synthetic data generation: a review

Y Lu, M Shen, H Wang, X Wang, C van Rechem… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …

Privacy for free: How does dataset condensation help privacy?

T Dong, B Zhao, L Lyu - International Conference on …, 2022 - proceedings.mlr.press
To prevent unintentional data leakage, research community has resorted to data generators
that can produce differentially private data for model training. However, for the sake of the …

Data distillation: A survey

N Sachdeva, J McAuley - arXiv preprint arXiv:2301.04272, 2023 - arxiv.org
The popularity of deep learning has led to the curation of a vast number of massive and
multifarious datasets. Despite having close-to-human performance on individual tasks …

Differentially private diffusion models

T Dockhorn, T Cao, A Vahdat, K Kreis - arXiv preprint arXiv:2210.09929, 2022 - arxiv.org
While modern machine learning models rely on increasingly large training datasets, data is
often limited in privacy-sensitive domains. Generative models trained with differential privacy …

Systematic review of generative modelling tools and utility metrics for fully synthetic tabular data

AD Lautrup, T Hyrup, A Zimek… - ACM Computing …, 2024 - dl.acm.org
Sharing data with third parties is essential for advancing science, but it is becoming more
and more difficult with the rise of data protection regulations, ethical restrictions, and growing …

Benchmarking differentially private synthetic data generation algorithms

Y Tao, R McKenna, M Hay, A Machanavajjhala… - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents a systematic benchmark of differentially private synthetic data generation
algorithms that can generate tabular data. Utility of the synthetic data is evaluated by …

Gs-wgan: A gradient-sanitized approach for learning differentially private generators

D Chen, T Orekondy, M Fritz - Advances in Neural …, 2020 - proceedings.neurips.cc
The wide-spread availability of rich data has fueled the growth of machine learning
applications in numerous domains. However, growth in domains with highly-sensitive data …

Differentially private diffusion models generate useful synthetic images

S Ghalebikesabi, L Berrada, S Gowal, I Ktena… - arXiv preprint arXiv …, 2023 - arxiv.org
The ability to generate privacy-preserving synthetic versions of sensitive image datasets
could unlock numerous ML applications currently constrained by data availability. Due to …

Don't generate me: Training differentially private generative models with sinkhorn divergence

T Cao, A Bie, A Vahdat, S Fidler… - Advances in Neural …, 2021 - proceedings.neurips.cc
Although machine learning models trained on massive data have led to breakthroughs in
several areas, their deployment in privacy-sensitive domains remains limited due to …

Private set generation with discriminative information

D Chen, R Kerkouche, M Fritz - Advances in Neural …, 2022 - proceedings.neurips.cc
Differentially private data generation techniques have become a promising solution to the
data privacy challenge––it enables sharing of data while complying with rigorous privacy …