Fine-tuning aligned language models compromises safety, even when users do not intend to!
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …
Data and model poisoning backdoor attacks on wireless federated learning, and the defense mechanisms: A comprehensive survey
Due to the greatly improved capabilities of devices, massive data, and increasing concern
about data privacy, Federated Learning (FL) has been increasingly considered for …
about data privacy, Federated Learning (FL) has been increasingly considered for …
Narcissus: A practical clean-label backdoor attack with limited information
Backdoor attacks introduce manipulated data into a machine learning model's training set,
causing the model to misclassify inputs with a trigger during testing to achieve a desired …
causing the model to misclassify inputs with a trigger during testing to achieve a desired …
Domain watermark: Effective and harmless dataset copyright protection is closed at hand
The prosperity of deep neural networks (DNNs) is largely benefited from open-source
datasets, based on which users can evaluate and improve their methods. In this paper, we …
datasets, based on which users can evaluate and improve their methods. In this paper, we …
Label poisoning is all you need
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in
order to gain control over its predictions on images with a specific attacker-defined trigger. A …
order to gain control over its predictions on images with a specific attacker-defined trigger. A …
Backdoor defense via adaptively splitting poisoned dataset
Backdoor defenses have been studied to alleviate the threat of deep neural networks
(DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt …
(DNNs) being backdoor attacked and thus maliciously altered. Since DNNs usually adopt …
Shared adversarial unlearning: Backdoor mitigation by unlearning shared adversarial examples
Backdoor attacks are serious security threats to machine learning models where an
adversary can inject poisoned samples into the training set, causing a backdoored model …
adversary can inject poisoned samples into the training set, causing a backdoored model …
Reconstructive neuron pruning for backdoor defense
Deep neural networks (DNNs) have been found to be vulnerable to backdoor attacks,
raising security concerns about their deployment in mission-critical applications. While …
raising security concerns about their deployment in mission-critical applications. While …
Not all samples are born equal: Towards effective clean-label backdoor attacks
Recent studies demonstrated that deep neural networks (DNNs) are vulnerable to backdoor
attacks. The attacked model behaves normally on benign samples, while its predictions are …
attacks. The attacked model behaves normally on benign samples, while its predictions are …
Scale-up: An efficient black-box input-level backdoor detection via analyzing scaled prediction consistency
Deep neural networks (DNNs) are vulnerable to backdoor attacks, where adversaries
embed a hidden backdoor trigger during the training process for malicious prediction …
embed a hidden backdoor trigger during the training process for malicious prediction …