Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arXiv preprint arXiv …, 2024 - arxiv.org
We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

Blind baselines beat membership inference attacks for foundation models

D Das, J Zhang, F Tramèr - arXiv preprint arXiv:2406.16201, 2024 - arxiv.org
Membership inference (MI) attacks try to determine if a data sample was used to train a
machine learning model. For foundation models trained on unknown Web data, MI attacks …

Recall: Membership inference via relative conditional log-likelihoods

R Xie, J Wang, R Huang, M Zhang, R Ge, J Pei… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid scaling of large language models (LLMs) has raised concerns about the
transparency and fair use of the pretraining data used for training them. Detecting such …

Con-ReCall: Detecting Pre-training Data in LLMs via Contrastive Decoding

C Wang, Y Wang, B Hooi, Y Cai, N Peng… - arXiv preprint arXiv …, 2024 - arxiv.org
The training data in large language models is key to their success, but it also presents
privacy and security risks, as it may contain sensitive information. Detecting pre-training data …

Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?

MA Panaitescu-Liess, Z Che, B An, Y Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated impressive capabilities in generating
diverse and contextually rich text. However, concerns regarding copyright infringement arise …

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models

H Chang, AS Shamsabadi, K Katevas… - arXiv preprint arXiv …, 2024 - arxiv.org
Prior Membership Inference Attacks (MIAs) on pre-trained Large Language Models (LLMs),
adapted from classification model attacks, fail due to ignoring the generative process of …

Scaling Up Membership Inference: When and How Attacks Succeed on Large Language Models

H Puerto, M Gubri, S Yun, SJ Oh - arXiv preprint arXiv:2411.00154, 2024 - arxiv.org
Membership inference attacks (MIA) attempt to verify the membership of a given data sample
in the training set for a model. MIA has become relevant in recent years, following the rapid …

Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data

J Zhang, D Das, G Kamath, F Tramèr - arXiv preprint arXiv:2409.19798, 2024 - arxiv.org
We consider the problem of a training data proof, where a data creator or owner wants to
demonstrate to a third party that some machine learning model was trained on their data …

Position: LLM Unlearning Benchmarks are Weak Measures of Progress

P Thaker, S Hu, N Kale, Y Maurya, ZS Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …

Semantic Membership Inference Attack against Large Language Models

H Mozaffari, VJ Marathe - arXiv preprint arXiv:2406.10218, 2024 - arxiv.org
Membership Inference Attacks (MIAs) determine whether a specific data point was included
in the training set of a target model. In this paper, we introduce the Semantic Membership …