On evaluation of adversarial perturbations for sequence-to-sequence models

B Alshemali, J Kalita - Knowledge-Based Systems, 2020 - Elsevier

Deep learning models have achieved great success in solving a variety of natural language
processing (NLP) problems. An ever-growing body of research, however, illustrates the …

被引用次数：235 相关文章所有 2 个版本

[PDF] arxiv.org

Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp

JX Morris, E Lifland, JY Yoo, J Grigsby, D Jin… - arXiv preprint arXiv …, 2020 - arxiv.org

While there has been substantial research using adversarial attacks to analyze NLP models,
each attack is implemented in its own code repository. It remains challenging to develop …

被引用次数：754 相关文章所有 5 个版本

[PDF] arxiv.org

Universal adversarial triggers for attacking and analyzing NLP

E Wallace, S Feng, N Kandpal, M Gardner… - arXiv preprint arXiv …, 2019 - arxiv.org

Adversarial examples highlight model vulnerabilities and are useful for evaluation and
interpretation. We define universal adversarial triggers: input-agnostic sequences of tokens …

被引用次数：910 相关文章所有 6 个版本

[PDF] arxiv.org

Evaluating models' local decision boundaries via contrast sets

M Gardner, Y Artzi, V Basmova, J Berant… - arXiv preprint arXiv …, 2020 - arxiv.org

Standard test sets for supervised learning evaluate in-distribution generalization.
Unfortunately, when a dataset has systematic gaps (eg, annotation artifacts), these …

被引用次数：483 相关文章所有 4 个版本

[PDF] arxiv.org

Weight poisoning attacks on pre-trained models

K Kurita, P Michel, G Neubig - arXiv preprint arXiv:2004.06660, 2020 - arxiv.org

Recently, NLP has seen a surge in the usage of large pre-trained models. Users download
weights of models pre-trained on large datasets, then fine-tune the weights on a task of their …

被引用次数：438 相关文章所有 4 个版本

[PDF] arxiv.org

Certified robustness to adversarial word substitutions

R Jia, A Raghunathan, K Göksel, P Liang - arXiv preprint arXiv …, 2019 - arxiv.org

State-of-the-art NLP models can often be fooled by adversaries that apply seemingly
innocuous label-preserving transformations (eg, paraphrasing) to input text. The number of …

被引用次数：335 相关文章所有 5 个版本

[PDF] arxiv.org

Bad characters: Imperceptible nlp attacks

N Boucher, I Shumailov, R Anderson… - … IEEE Symposium on …, 2022 - ieeexplore.ieee.org

Several years of research have shown that machine-learning systems are vulnerable to
adversarial examples, both in theory and in practice. Until now, such attacks have primarily …

被引用次数：143 相关文章所有 8 个版本

[PDF] arxiv.org

Cline: Contrastive learning with semantic negative examples for natural language understanding

D Wang, N Ding, P Li, HT Zheng - arXiv preprint arXiv:2107.00440, 2021 - arxiv.org

Despite pre-trained language models have proven useful for learning high-quality semantic
representations, these models are still vulnerable to simple perturbations. Recent works …

被引用次数：125 相关文章所有 4 个版本

[PDF] arxiv.org

Concealed data poisoning attacks on NLP models

E Wallace, TZ Zhao, S Feng, S Singh - arXiv preprint arXiv:2010.12563, 2020 - arxiv.org

Adversarial attacks alter NLP model predictions by perturbing test-time inputs. However, it is
much less understood whether, and how, predictions can be manipulated with small …

被引用次数：176 相关文章所有 7 个版本

[PDF] aaai.org

Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples

M Cheng, J Yi, PY Chen, H Zhang, CJ Hsieh - Proceedings of the AAAI …, 2020 - aaai.org

Crafting adversarial examples has become an important technique to evaluate the
robustness of deep neural networks (DNNs). However, most existing works focus on …

被引用次数：274 相关文章所有 13 个版本