On memorization in probabilistic deep generative models

A Borji - Computer Vision and Image Understanding, 2022 - Elsevier

This work is an update of my previous paper on the same topic published a few years ago
(Borji, 2019). With the dramatic progress in generative modeling, a suite of new quantitative …

被引用次数：289 相关文章所有 5 个版本

[PDF] usenix.org

Extracting training data from diffusion models

N Carlini, J Hayes, M Nasr, M Jagielski… - 32nd USENIX Security …, 2023 - usenix.org

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted
significant attention due to their ability to generate high-quality synthetic images. In this work …

被引用次数：450 相关文章所有 7 个版本

[PDF] acm.org

Enhanced membership inference attacks against machine learning models

J Ye, A Maddi, SK Murakonda… - Proceedings of the …, 2022 - dl.acm.org

How much does a machine learning algorithm leak about its training data, and why?
Membership inference attacks are used as an auditing tool to quantify this leakage. In this …

被引用次数：217 相关文章所有 5 个版本

[PDF] mlr.press

Structure-informed language models are protein designers

Z Zheng, Y Deng, D Xue, Y Zhou… - … on machine learning, 2023 - proceedings.mlr.press

This paper demonstrates that language models are strong structure-based protein
designers. We present LM-Design, a generic approach to reprogramming sequence-based …

被引用次数：43 相关文章所有 9 个版本

[PDF] springer.com

Training data influence analysis and estimation: A survey

Z Hammoudeh, D Lowd - Machine Learning, 2024 - Springer

Good models require good training data. For overparameterized deep models, the causal
relationship between training data and model predictions is increasingly opaque and poorly …

被引用次数：51 相关文章所有 4 个版本

[PDF] arxiv.org

Training data extraction from pre-trained language models: A survey

S Ishihara - arXiv preprint arXiv:2305.16157, 2023 - arxiv.org

As the deployment of pre-trained language models (PLMs) expands, pressing security
concerns have arisen regarding the potential for malicious extraction of training data, posing …

被引用次数：26 相关文章所有 6 个版本

[PDF] arxiv.org

Practical membership inference attacks against fine-tuned large language models via self-prompt calibration

W Fu, H Wang, C Gao, G Liu, Y Li, T Jiang - arXiv preprint arXiv …, 2023 - arxiv.org

Membership Inference Attacks (MIA) aim to infer whether a target data record has been
utilized for model training or not. Prior attempts have quantified the privacy risks of language …

被引用次数：22 相关文章所有 3 个版本

[PDF] neurips.cc

Feature likelihood score: Evaluating the generalization of generative models using samples

M Jiralerspong, J Bose, I Gemp, C Qin… - Advances in …, 2024 - proceedings.neurips.cc

The past few years have seen impressive progress in the development of deep generative
models capable of producing high-dimensional, complex, and photo-realistic data. However …

被引用次数：13 相关文章所有 3 个版本

[PDF] cell.com Full View

Generating realistic neurophysiological time series with denoising diffusion probabilistic models

J Vetter, JH Macke, R Gao - Patterns, 2024 - cell.com

Denoising diffusion probabilistic models (DDPMs) have recently been shown to accurately
generate complicated data such as images, audio, or time series. Experimental and clinical …

被引用次数：10 相关文章所有 2 个版本

[PDF] openreview.net

Diffusion probabilistic models generalize when they fail to memorize

TH Yoon, JY Choi, S Kwon, EK Ryu - ICML 2023 Workshop on …, 2023 - openreview.net

In this work, we study the training of diffusion probabilistic models through a series of
hypotheses and carefully designed experiments. We call our key finding the memorization …

被引用次数：19 相关文章