Machine learning for synthetic data generation: a review
Machine learning heavily relies on data, but real-world applications often encounter various
data-related issues. These include data of poor quality, insufficient data points leading to …
data-related issues. These include data of poor quality, insufficient data points leading to …
Privacy for free: How does dataset condensation help privacy?
To prevent unintentional data leakage, research community has resorted to data generators
that can produce differentially private data for model training. However, for the sake of the …
that can produce differentially private data for model training. However, for the sake of the …
Data distillation: A survey
N Sachdeva, J McAuley - arXiv preprint arXiv:2301.04272, 2023 - arxiv.org
The popularity of deep learning has led to the curation of a vast number of massive and
multifarious datasets. Despite having close-to-human performance on individual tasks …
multifarious datasets. Despite having close-to-human performance on individual tasks …
Differentially private diffusion models
While modern machine learning models rely on increasingly large training datasets, data is
often limited in privacy-sensitive domains. Generative models trained with differential privacy …
often limited in privacy-sensitive domains. Generative models trained with differential privacy …
Systematic review of generative modelling tools and utility metrics for fully synthetic tabular data
Sharing data with third parties is essential for advancing science, but it is becoming more
and more difficult with the rise of data protection regulations, ethical restrictions, and growing …
and more difficult with the rise of data protection regulations, ethical restrictions, and growing …
Benchmarking differentially private synthetic data generation algorithms
This work presents a systematic benchmark of differentially private synthetic data generation
algorithms that can generate tabular data. Utility of the synthetic data is evaluated by …
algorithms that can generate tabular data. Utility of the synthetic data is evaluated by …
Gs-wgan: A gradient-sanitized approach for learning differentially private generators
The wide-spread availability of rich data has fueled the growth of machine learning
applications in numerous domains. However, growth in domains with highly-sensitive data …
applications in numerous domains. However, growth in domains with highly-sensitive data …
Differentially private diffusion models generate useful synthetic images
The ability to generate privacy-preserving synthetic versions of sensitive image datasets
could unlock numerous ML applications currently constrained by data availability. Due to …
could unlock numerous ML applications currently constrained by data availability. Due to …
Don't generate me: Training differentially private generative models with sinkhorn divergence
Although machine learning models trained on massive data have led to breakthroughs in
several areas, their deployment in privacy-sensitive domains remains limited due to …
several areas, their deployment in privacy-sensitive domains remains limited due to …
Private set generation with discriminative information
Differentially private data generation techniques have become a promising solution to the
data privacy challenge––it enables sharing of data while complying with rigorous privacy …
data privacy challenge––it enables sharing of data while complying with rigorous privacy …