Breaking NLI systems with sentences that require simple lexical inferences

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com

Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

被引用次数：2161 相关文章所有 12 个版本

[PDF] royalsocietypublishing.org

Symbols and grounding in large language models

E Pavlick - … Transactions of the Royal Society A, 2023 - royalsocietypublishing.org

Large language models (LLMs) are one of the most impressive achievements of artificial
intelligence in recent years. However, their relevance to the study of language more broadly …

被引用次数：96 相关文章所有 4 个版本

[PDF] arxiv.org

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org

Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

被引用次数：909 相关文章所有 9 个版本

[PDF] arxiv.org

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org

We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

被引用次数：411 相关文章所有 9 个版本

[PDF] arxiv.org

Bert-attack: Adversarial attack against bert using bert

L Li, R Ma, Q Guo, X Xue, X Qiu - arXiv preprint arXiv:2004.09984, 2020 - arxiv.org

Adversarial attacks for discrete data (such as texts) have been proved significantly more
challenging than continuous data (such as images) since it is difficult to generate adversarial …

被引用次数：710 相关文章所有 6 个版本

[PDF] arxiv.org

Adversarial NLI: A new benchmark for natural language understanding

Y Nie, A Williams, E Dinan, M Bansal, J Weston… - arXiv preprint arXiv …, 2019 - arxiv.org

We introduce a new large-scale NLI benchmark dataset, collected via an iterative,
adversarial human-and-model-in-the-loop procedure. We show that training models on this …

被引用次数：987 相关文章所有 9 个版本

[PDF] arxiv.org

Hellaswag: Can a machine really finish your sentence?

R Zellers, A Holtzman, Y Bisk, A Farhadi… - arXiv preprint arXiv …, 2019 - arxiv.org

Recent work by Zellers et al.(2018) introduced a new task of commonsense natural
language inference: given an event description such as" A woman sits at a piano," a …

被引用次数：1721 相关文章所有 4 个版本

[PDF] aclanthology.org

BoolQ: Exploring the surprising difficulty of natural yes/no questions

C Clark, K Lee, MW Chang, T Kwiatkowski… - arXiv preprint arXiv …, 2019 - arxiv.org

In this paper we study yes/no questions that are naturally occurring---meaning that they are
generated in unprompted and unconstrained settings. We build a reading comprehension …

被引用次数：1216 相关文章所有 7 个版本

[PDF] arxiv.org

Clever hans or neural theory of mind? stress testing social reasoning in large language models

N Shapira, M Levy, SH Alavi, X Zhou, Y Choi… - arXiv preprint arXiv …, 2023 - arxiv.org

The escalating debate on AI's capabilities warrants developing reliable metrics to assess
machine" intelligence". Recently, many anecdotal examples were used to suggest that …

被引用次数：108 相关文章所有 4 个版本

[PDF] arxiv.org

Learning the difference that makes a difference with counterfactually-augmented data

D Kaushik, E Hovy, ZC Lipton - arXiv preprint arXiv:1909.12434, 2019 - arxiv.org

Despite alarm over the reliance of machine learning systems on so-called spurious patterns,
the term lacks coherent meaning in standard statistical frameworks. However, the language …

被引用次数：637 相关文章所有 4 个版本