Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

Language models are not naysayers: an analysis of language models on negation benchmarks

TH Truong, T Baldwin, K Verspoor, T Cohn - arXiv preprint arXiv …, 2023 - arxiv.org
Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …

Using natural language explanations to improve robustness of in-context learning

X He, Y Wu, OM Camburu, P Minervini… - Proceedings of the …, 2024 - discovery.ucl.ac.uk
Recent studies demonstrated that large language models (LLMs) can excel in many tasks
via in-context learning (ICL). However, recent works show that ICL-prompted models tend to …

This is not a dataset: A large negation benchmark to challenge large language models

I García-Ferrero, B Altuna, J Álvez… - arXiv preprint arXiv …, 2023 - arxiv.org
Although large language models (LLMs) have apparently acquired a certain level of
grammatical knowledge and the ability to make generalizations, they fail to interpret …

Glore: Evaluating logical reasoning of large language models

Z Teng, R Ning, J Liu, Q Zhou, Y Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, large language models (LLMs), including notable models such as GPT-4 and
burgeoning community models, have showcased significant general language …

Leveraging affirmative interpretations from negation improves natural language understanding

MM Hossain, E Blanco - arXiv preprint arXiv:2210.14486, 2022 - arxiv.org
Negation poses a challenge in many natural language understanding tasks. Inspired by the
fact that understanding a negated statement often requires humans to infer affirmative …

Principles from Clinical Research for NLP Model Generalization

A Elangovan, J He, Y Li, K Verspoor - Proceedings of the 2024 …, 2024 - aclanthology.org
The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …

A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks

E Antoine, F Bechet, G Damnati… - Proceedings of the 2024 …, 2024 - aclanthology.org
We introduce an evaluation methodology for reading comprehension tasks based on the
intuition that certain examples, by the virtue of their linguistic complexity, consistently yield …

Generalization of NLP Models: Notion and Causation

A Elangovan, J He, Y Li, K Verspoor - arXiv preprint arXiv:2311.03663, 2023 - arxiv.org
The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …