A survey of backdoor attacks and defenses on large language models: Implications for security measures

S Zhao, M Jia, Z Guo, L Gan, X Xu, X Wu, J Fu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs), which bridge the gap between human language
understanding and complex problem-solving, achieve state-of-the-art performance on …

Transferring backdoors between large language models by knowledge distillation

P Cheng, Z Wu, T Ju, W Du, ZZG Liu - arXiv preprint arXiv:2408.09878, 2024 - arxiv.org
Backdoor Attacks have been a serious vulnerability against Large Language Models
(LLMs). However, previous methods only reveal such risk in specific models, or present …

Certifiably Robust RAG against Retrieval Corruption

C Xiang, T Wu, Z Zhong, D Wagner, D Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption
attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate …

Mitigating backdoor threats to large language models: Advancement and challenges

Q Liu, W Mo, T Tong, J Xu, F Wang… - 2024 60th Annual …, 2024 - ieeexplore.ieee.org
The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …

Typos that Broke the RAG's Back: Genetic Attack on RAG Pipeline by Simulating Documents in the Wild via Low-level Perturbations

S Cho, S Jeong, J Seo, T Hwang, JC Park - arXiv preprint arXiv …, 2024 - arxiv.org
The robustness of recent Large Language Models (LLMs) has become increasingly crucial
as their applicability expands across various domains and real-world applications. Retrieval …

Robust neural information retrieval: An adversarial and out-of-distribution perspective

YA Liu, R Zhang, J Guo, M de Rijke, Y Fan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advances in neural information retrieval (IR) models have significantly enhanced
their effectiveness over various IR tasks. The robustness of these models, essential for …

SynGhost: Imperceptible and Universal Task-agnostic Backdoor Attack in Pre-trained Language Models

P Cheng, W Du, Z Wu, F Zhang, L Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Pre-training has been a necessary phase for deploying pre-trained language models
(PLMs) to achieve remarkable performance in downstream tasks. However, we empirically …

Unlearning Backdoor Attacks for LLMs with Weak-to-Strong Knowledge Distillation

S Zhao, X Wu, CD Nguyen, M Jia, Y Feng… - arXiv preprint arXiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) can bridge the gap between large language models
(LLMs) and downstream tasks. However, PEFT has been proven vulnerable to malicious …

Robust Information Retrieval

YA Liu, R Zhang, J Guo, M de Rijke - … of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Beyond effectiveness, the robustness of an information retrieval (IR) system is increasingly
attracting attention. When deployed, a critical technology such as IR should not only deliver …

Weak-to-Strong Backdoor Attack for Large Language Models

S Zhao, L Gan, Z Guo, X Wu, L Xiao, X Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite being widely applied due to their exceptional capabilities, Large Language Models
(LLMs) have been proven to be vulnerable to backdoor attacks. These attacks introduce …