Blind backdoors in deep learning models

Y Gao, BG Doan, Z Zhang, S Ma, J Zhang, A Fu… - arXiv preprint arXiv …, 2020 - arxiv.org

This work provides the community with a timely comprehensive review of backdoor attacks
and countermeasures on deep learning. According to the attacker's capability and affected …

被引用次数：198 相关文章所有 3 个版本

[PDF] arxiv.org

The threat of offensive ai to organizations

Y Mirsky, A Demontis, J Kotak, R Shankar, D Gelei… - Computers & …, 2023 - Elsevier

AI has provided us with the ability to automate tasks, extract information from vast amounts of
data, and synthesize media that is nearly indistinguishable from the real thing. However …

被引用次数：77 相关文章所有 7 个版本

[PDF] arxiv.org

Unsolved problems in ml safety

D Hendrycks, N Carlini, J Schulman… - arXiv preprint arXiv …, 2021 - arxiv.org

Machine learning (ML) systems are rapidly increasing in size, are acquiring new
capabilities, and are increasingly deployed in high-stakes settings. As with other powerful …

被引用次数：271 相关文章所有 6 个版本

[PDF] arxiv.org

Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, RGH Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

被引用次数：136 相关文章所有 3 个版本

[PDF] thecvf.com

Lira: Learnable, imperceptible and robust backdoor attacks

K Doan, Y Lao, W Zhao, P Li - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Recently, machine learning models have demonstrated to be vulnerable to backdoor
attacks, primarily due to the lack of transparency in black-box models such as deep neural …

被引用次数：196 相关文章所有 5 个版本

[PDF] arxiv.org

Backdoor learning: A survey

Y Li, Y Jiang, Z Li, ST Xia - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org

Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so
that the attacked models perform well on benign samples, whereas their predictions will be …

被引用次数：537 相关文章所有 6 个版本

[PDF] neurips.cc

Backdoorbench: A comprehensive benchmark of backdoor learning

B Wu, H Chen, M Zhang, Z Zhu, S Wei… - Advances in …, 2022 - proceedings.neurips.cc

Backdoor learning is an emerging and vital topic for studying deep neural networks'
vulnerability (DNNs). Many pioneering backdoor attack and defense methods are being …

被引用次数：91 相关文章所有 6 个版本

[PDF] arxiv.org

Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning

J Jia, Y Liu, NZ Gong - 2022 IEEE Symposium on Security and …, 2022 - ieeexplore.ieee.org

Self-supervised learning in computer vision aims to pre-train an image encoder using a
large amount of unlabeled images or (image, text) pairs. The pre-trained image encoder can …

被引用次数：146 相关文章所有 7 个版本

[PDF] neurips.cc

Label poisoning is all you need

R Jha, J Hayase, S Oh - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In a backdoor attack, an adversary injects corrupted data into a model's training dataset in
order to gain control over its predictions on images with a specific attacker-defined trigger. A …

被引用次数：12 相关文章所有 7 个版本

[PDF] acm.org

Truth serum: Poisoning machine learning models to reveal their secrets

F Tramèr, R Shokri, A San Joaquin, H Le… - Proceedings of the …, 2022 - dl.acm.org

We introduce a new class of attacks on machine learning models. We show that an
adversary who can poison a training dataset can cause models trained on this dataset to …

被引用次数：90 相关文章所有 5 个版本