Moderate-fitting as a natural backdoor defender for pre-trained language models

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

L Xu, H Xie, SZJ Qin, X Tao, FL Wang - arXiv preprint arXiv:2312.12148, 2023 - arxiv.org

With the continuous growth in the number of parameters of transformer-based pretrained
language models (PLMs), particularly the emergence of large language models (LLMs) with …

被引用次数：42 相关文章所有 2 个版本

[PDF] neurips.cc

Setting the trap: Capturing and defeating backdoors in pretrained language models through honeypots

RR Tang, J Yuan, Y Li, Z Liu… - Advances in Neural …, 2023 - proceedings.neurips.cc

In the field of natural language processing, the prevalent approach involves fine-tuning
pretrained language models (PLMs) using local samples. Recent research has exposed the …

被引用次数：5 相关文章所有 5 个版本

[PDF] neurips.cc

Parafuzz: An interpretability-driven technique for detecting poisoned samples in nlp

L Yan, Z Zhang, G Tao, K Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Backdoor attacks have emerged as a prominent threat to natural language processing (NLP)
models, where the presence of specific triggers in the input can lead poisoned models to …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger

J Li, Y Yang, Z Wu, VG Vydiswaran, C Xiao - arXiv preprint arXiv …, 2023 - arxiv.org

Textual backdoor attacks pose a practical threat to existing systems, as they can
compromise the model by inserting imperceptible triggers into inputs and manipulating …

被引用次数：19 相关文章所有 3 个版本

[PDF] ieee.org

Backdoor Attacks to Deep Neural Networks: A Survey of the Literature, Challenges, and Future Research Directions

O Mengara, A Avila, TH Falk - IEEE Access, 2024 - ieeexplore.ieee.org

Deep neural network (DNN) classifiers are potent instruments that can be used in various
security-sensitive applications. Nonetheless, they are vulnerable to certain attacks that …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

Tijo: Trigger inversion with joint optimization for defending multimodal backdoored models

I Sur, K Sikka, M Walmer… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present a Multimodal Backdoor defense technique TIJO (Trigger Inversion
using Joint Optimization). Recently Walmer et al. demonstrated successful backdoor attacks …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Textguard: Provable defense against backdoor attacks on text classification

H Pei, J Jia, W Guo, B Li, D Song - arXiv preprint arXiv:2311.11225, 2023 - arxiv.org

Backdoor attacks have become a major security threat for deploying machine learning
models in security-critical applications. Existing research endeavors have proposed many …

被引用次数：7 相关文章所有 4 个版本

[PDF] neurips.cc

Black-box backdoor defense via zero-shot image purification

Y Shi, M Du, X Wu, Z Guan, J Sun… - Advances in Neural …, 2024 - proceedings.neurips.cc

Backdoor attacks inject poisoned samples into the training data, resulting in the
misclassification of the poisoned input during a model's deployment. Defending against …

被引用次数：8 相关文章所有 6 个版本

[PDF] arxiv.org

Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review

P Cheng, Z Wu, W Du, G Liu - arXiv preprint arXiv:2309.06055, 2023 - arxiv.org

Deep Neural Networks (DNNs) have led to unprecedented progress in various natural
language processing (NLP) tasks. Owing to limited data and computation resources, using …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Bite: Textual backdoor attacks with iterative trigger injection

J Yan, V Gupta, X Ren - arXiv preprint arXiv:2205.12700, 2022 - arxiv.org

Backdoor attacks have become an emerging threat to NLP systems. By providing poisoned
training data, the adversary can embed a" backdoor" into the victim model, which allows …

被引用次数：17 相关文章所有 6 个版本