Improving the reliability of deep neural networks in NLP: A review
B Alshemali, J Kalita - Knowledge-Based Systems, 2020 - Elsevier
Deep learning models have achieved great success in solving a variety of natural language
processing (NLP) problems. An ever-growing body of research, however, illustrates the …
processing (NLP) problems. An ever-growing body of research, however, illustrates the …
Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp
While there has been substantial research using adversarial attacks to analyze NLP models,
each attack is implemented in its own code repository. It remains challenging to develop …
each attack is implemented in its own code repository. It remains challenging to develop …
Universal adversarial triggers for attacking and analyzing NLP
Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …
Evaluating models' local decision boundaries via contrast sets
Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …
Weight poisoning attacks on pre-trained models
Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …
Certified robustness to adversarial word substitutions
State-of-the-art NLP models can often be fooled by adversaries that apply seemingly
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …
Bad characters: Imperceptible nlp attacks
Several years of research have shown that machine-learning systems are vulnerable to
adversarial examples, both in theory and in practice. Until now, such attacks have primarily …
adversarial examples, both in theory and in practice. Until now, such attacks have primarily …
Cline: Contrastive learning with semantic negative examples for natural language understanding
Despite pre-trained language models have proven useful for learning high-quality semantic
representations, these models are still vulnerable to simple perturbations. Recent works …
representations, these models are still vulnerable to simple perturbations. Recent works …
Concealed data poisoning attacks on NLP models
Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is
much less understood whether, and how, predictions can be manipulated with small …
much less understood whether, and how, predictions can be manipulated with small …
Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples
Crafting adversarial examples has become an important technique to evaluate the
robustness of deep neural networks (DNNs). However, most existing works focus on …
robustness of deep neural networks (DNNs). However, most existing works focus on …