A survey of adversarial defenses and robustness in nlp

S Goyal, S Doddapaneni, MM Khapra… - ACM Computing …, 2023 - dl.acm.org
In the past few years, it has become increasingly evident that deep neural networks are not
resilient enough to withstand adversarial perturbations in input data, leaving them …

Robust natural language processing: Recent advances, challenges, and future directions

M Omar, S Choi, DH Nyang, D Mohaisen - IEEE Access, 2022 - ieeexplore.ieee.org
Recent natural language processing (NLP) techniques have accomplished high
performance on benchmark data sets, primarily due to the significant improvement in the …

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

S Zhao, J Wen, LA Tuan, J Zhao, J Fu - arXiv preprint arXiv:2305.01219, 2023 - arxiv.org
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …

Defending against alignment-breaking attacks via robustly aligned llm

B Cao, Y Cao, L Lin, J Chen - arXiv preprint arXiv:2309.14348, 2023 - arxiv.org
Recently, Large Language Models (LLMs) have made significant advancements and are
now widely used across various domains. Unfortunately, there has been a rising concern …

Towards improving adversarial training of NLP models

JY Yoo, Y Qi - arXiv preprint arXiv:2109.00544, 2021 - arxiv.org
Adversarial training, a method for learning robust deep neural networks, constructs
adversarial examples during training. However, recent methods for generating NLP …

Evaluating the robustness of neural language models to input perturbations

M Moradi, M Samwald - arXiv preprint arXiv:2108.12237, 2021 - arxiv.org
High-performance neural language models have obtained state-of-the-art results on a wide
range of Natural Language Processing (NLP) tasks. However, results for common …

Searching for an effective defender: Benchmarking defense against adversarial word substitution

Z Li, J Xu, J Zeng, L Li, X Zheng, Q Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
Recent studies have shown that deep neural networks are vulnerable to intentionally crafted
adversarial examples, and various methods have been proposed to defend against …

How should pre-trained language models be fine-tuned towards adversarial robustness?

X Dong, AT Luu, M Lin, S Yan… - Advances in Neural …, 2021 - proceedings.neurips.cc
The fine-tuning of pre-trained language models has a great success in many NLP fields. Yet,
it is strikingly vulnerable to adversarial examples, eg, word substitution attacks using only …

Transformer models used for text-based question answering systems

K Nassiri, M Akhloufi - Applied Intelligence, 2023 - Springer
The question answering system is frequently applied in the area of natural language
processing (NLP) because of the wide variety of applications. It consists of answering …

Prada: Practical black-box adversarial attacks against neural ranking models

C Wu, R Zhang, J Guo, M De Rijke, Y Fan… - ACM Transactions on …, 2023 - dl.acm.org
Neural ranking models (NRMs) have shown remarkable success in recent years, especially
with pre-trained language models. However, deep neural models are notorious for their …