Backdoor attacks and countermeasures on deep learning: A comprehensive review

Y Gao, BG Doan, Z Zhang, S Ma, J Zhang, A Fu… - arXiv preprint arXiv …, 2020 - arxiv.org
This work provides the community with a timely comprehensive review of backdoor attacks
and countermeasures on deep learning. According to the attacker's capability and affected …

The threat of offensive ai to organizations

Y Mirsky, A Demontis, J Kotak, R Shankar, D Gelei… - Computers & …, 2023 - Elsevier
AI has provided us with the ability to automate tasks, extract information from vast amounts of
data, and synthesize media that is nearly indistinguishable from the real thing. However …

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arXiv preprint arXiv …, 2021 - arxiv.org
Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, RGH Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Lira: Learnable, imperceptible and robust backdoor attacks

K Doan, Y Lao, W Zhao, P Li - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Recently, machine learning models have demonstrated to be vulnerable to backdoor
attacks, primarily due to the lack of transparency in black-box models such as deep neural …

Backdoor learning: A survey

Y Li, Y Jiang, Z Li, ST Xia - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so
that the attacked models perform well on benign samples, whereas their predictions will be …

Backdoorbench: A comprehensive benchmark of backdoor learning

B Wu, H Chen, M Zhang, Z Zhu, S Wei… - Advances in …, 2022 - proceedings.neurips.cc
Backdoor learning is an emerging and vital topic for studying deep neural networks'
vulnerability (DNNs). Many pioneering backdoor attack and defense methods are being …

Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning

J Jia, Y Liu, NZ Gong - 2022 IEEE Symposium on Security and …, 2022 - ieeexplore.ieee.org
Self-supervised learning in computer vision aims to pre-train an image encoder using a
large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can …

Label poisoning is all you need

R Jha, J Hayase, S Oh - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In a backdoor attack, an adversary injects corrupted data into a model's training dataset in
order to gain control over its predictions on images with a specific attacker-defined trigger. A …

Truth serum: Poisoning machine learning models to reveal their secrets

F Tramèr, R Shokri, A San Joaquin, H Le… - Proceedings of the …, 2022 - dl.acm.org
We introduce a new class of attacks on machine learning models. We show that an
adversary who can poison a training dataset can cause models trained on this dataset to …