Improving the reliability of deep neural networks in NLP: A review

B Alshemali, J Kalita - Knowledge-Based Systems, 2020 - Elsevier
Deep learning models have achieved great success in solving a variety of natural language
processing (NLP) problems. An ever-growing body of research, however, illustrates the …

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp

JX Morris, E Lifland, JY Yoo, J Grigsby, D Jin… - arXiv preprint arXiv …, 2020 - arxiv.org
While there has been substantial research using adversarial attacks to analyze NLP models,
each attack is implemented in its own code repository. It remains challenging to develop …

Universal adversarial triggers for attacking and analyzing NLP

E Wallace, S Feng, N Kandpal, M Gardner… - arXiv preprint arXiv …, 2019 - arxiv.org
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …

Evaluating models' local decision boundaries via contrast sets

M Gardner, Y Artzi, V Basmova, J Berant… - arXiv preprint arXiv …, 2020 - arxiv.org
Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …

Weight poisoning attacks on pre-trained models

K Kurita, P Michel, G Neubig - arXiv preprint arXiv:2004.06660, 2020 - arxiv.org
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …

Certified robustness to adversarial word substitutions

R Jia, A Raghunathan, K Göksel, P Liang - arXiv preprint arXiv …, 2019 - arxiv.org
State-of-the-art NLP models can often be fooled by adversaries that apply seemingly
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …

Bad characters: Imperceptible nlp attacks

N Boucher, I Shumailov, R Anderson… - … IEEE Symposium on …, 2022 - ieeexplore.ieee.org
Several years of research have shown that machine-learning systems are vulnerable to
adversarial examples, both in theory and in practice. Until now, such attacks have primarily …

Cline: Contrastive learning with semantic negative examples for natural language understanding

D Wang, N Ding, P Li, HT Zheng - arXiv preprint arXiv:2107.00440, 2021 - arxiv.org
Despite pre-trained language models have proven useful for learning high-quality semantic
representations, these models are still vulnerable to simple perturbations. Recent works …

Concealed data poisoning attacks on NLP models

E Wallace, TZ Zhao, S Feng, S Singh - arXiv preprint arXiv:2010.12563, 2020 - arxiv.org
Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is
much less understood whether, and how, predictions can be manipulated with small …

Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples

M Cheng, J Yi, PY Chen, H Zhang, CJ Hsieh - Proceedings of the AAAI …, 2020 - aaai.org
Crafting adversarial examples has become an important technique to evaluate the
robustness of deep neural networks (DNNs). However, most existing works focus on …