Shortcut learning in deep neural networks

R Geirhos, JH Jacobsen, C Michaelis… - Nature Machine …, 2020 - nature.com
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of
today's machine intelligence. Numerous success stories have rapidly spread all over …

Symbols and grounding in large language models

E Pavlick - … Transactions of the Royal Society A, 2023 - royalsocietypublishing.org
Large language models (LLMs) are one of the most impressive achievements of artificial
intelligence in recent years. However, their relevance to the study of language more broadly …

A survey of data augmentation approaches for NLP

SY Feng, V Gangal, J Wei, S Chandar… - arXiv preprint arXiv …, 2021 - arxiv.org
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …

Dynabench: Rethinking benchmarking in NLP

D Kiela, M Bartolo, Y Nie, D Kaushik, A Geiger… - arXiv preprint arXiv …, 2021 - arxiv.org
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …

Bert-attack: Adversarial attack against bert using bert

L Li, R Ma, Q Guo, X Xue, X Qiu - arXiv preprint arXiv:2004.09984, 2020 - arxiv.org
Adversarial attacks for discrete data (such as texts) have been proved significantly more
challenging than continuous data (such as images) since it is difficult to generate adversarial …

Adversarial NLI: A new benchmark for natural language understanding

Y Nie, A Williams, E Dinan, M Bansal, J Weston… - arXiv preprint arXiv …, 2019 - arxiv.org
We introduce a new large-scale NLI benchmark dataset, collected via an iterative,
adversarial human-and-model-in-the-loop procedure. We show that training models on this …

Hellaswag: Can a machine really finish your sentence?

R Zellers, A Holtzman, Y Bisk, A Farhadi… - arXiv preprint arXiv …, 2019 - arxiv.org
Recent work by Zellers et al.(2018) introduced a new task of commonsense natural
language inference: given an event description such as" A woman sits at a piano," a …

BoolQ: Exploring the surprising difficulty of natural yes/no questions

C Clark, K Lee, MW Chang, T Kwiatkowski… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper we study yes/no questions that are naturally occurring---meaning that they are
generated in unprompted and unconstrained settings. We build a reading comprehension …

Clever hans or neural theory of mind? stress testing social reasoning in large language models

N Shapira, M Levy, SH Alavi, X Zhou, Y Choi… - arXiv preprint arXiv …, 2023 - arxiv.org
The escalating debate on AI's capabilities warrants developing reliable metrics to assess
machine" intelligence". Recently, many anecdotal examples were used to suggest that …

Learning the difference that makes a difference with counterfactually-augmented data

D Kaushik, E Hovy, ZC Lipton - arXiv preprint arXiv:1909.12434, 2019 - arxiv.org
Despite alarm over the reliance of machine learning systems on so-called spurious patterns,
the term lacks coherent meaning in standard statistical frameworks. However, the language …