Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal… - arXiv preprint arXiv …, 2023 - arxiv.org
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanisms: A comprehensive survey

Y Wan, Y Qu, W Ni, Y Xiang, L Gao… - … Surveys & Tutorials, 2024 - ieeexplore.ieee.org
Due to the greatly improved capabilities of devices, massive data, and increasing concern
about data privacy, Federated Learning (FL) has been increasingly considered for …

Narcissus: A practical clean-label backdoor attack with limited information

Y Zeng, M Pan, HA Just, L Lyu, M Qiu… - Proceedings of the 2023 …, 2023 - dl.acm.org
Backdoor attacks introduce manipulated data into a machine learning model's training set,
causing the model to misclassify inputs with a trigger during testing to achieve a desired …

Domain watermark: Effective and harmless dataset copyright protection is closed at hand

J Guo, Y Li, L Wang, ST Xia… - Advances in Neural …, 2024 - proceedings.neurips.cc
The prosperity of deep neural networks (DNNs) is largely benefited from open-source
datasets, based on which users can evaluate and improve their methods. In this paper, we …

Label poisoning is all you need

R Jha, J Hayase, S Oh - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in
order to gain control over its predictions on images with a specific attacker-defined trigger. A …

Backdoor defense via adaptively splitting poisoned dataset

K Gao, Y Bai, J Gu, Y Yang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Backdoor defenses have been studied to alleviate the threat of deep neural networks
(DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt …

Shared adversarial unlearning: Backdoor mitigation by unlearning shared adversarial examples

S Wei, M Zhang, H Zha, B Wu - Advances in Neural …, 2023 - proceedings.neurips.cc
Backdoor attacks are serious security threats to machine learning models where an
adversary can inject poisoned samples into the training set, causing a backdoored model …

Reconstructive neuron pruning for backdoor defense

Y Li, X Lyu, X Ma, N Koren, L Lyu… - … on Machine Learning, 2023 - proceedings.mlr.press
Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks,
raising security concerns about their deployment in mission-critical applications. While …

Not all samples are born equal: Towards effective clean-label backdoor attacks

Y Gao, Y Li, L Zhu, D Wu, Y Jiang, ST Xia - Pattern Recognition, 2023 - Elsevier
Recent studies demonstrated that deep neural networks (DNNs) are vulnerable to backdoor
attacks. The attacked model behaves normally on benign samples, while its predictions are …

Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency

J Guo, Y Li, X Chen, H Guo, L Sun, C Liu - arXiv preprint arXiv:2302.03251, 2023 - arxiv.org
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries
embed a hidden backdoor trigger during the training process for malicious prediction …