Language model behavior: A comprehensive survey
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
Benchmarks for automated commonsense reasoning: A survey
E Davis - ACM Computing Surveys, 2023 - dl.acm.org
More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …
Language models are not naysayers: an analysis of language models on negation benchmarks
Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …
BERT. However, whether this finding still holds for larger-sized auto-regressive language …
Using natural language explanations to improve robustness of in-context learning
Recent studies demonstrated that large language models (LLMs) can excel in many tasks
via in-context learning (ICL). However, recent works show that ICL-prompted models tend to …
via in-context learning (ICL). However, recent works show that ICL-prompted models tend to …
This is not a dataset: A large negation benchmark to challenge large language models
Although large language models (LLMs) have apparently acquired a certain level of
grammatical knowledge and the ability to make generalizations, they fail to interpret …
grammatical knowledge and the ability to make generalizations, they fail to interpret …
Glore: Evaluating logical reasoning of large language models
Recently, large language models (LLMs), including notable models such as GPT-4 and
burgeoning community models, have showcased significant general language …
burgeoning community models, have showcased significant general language …
Leveraging affirmative interpretations from negation improves natural language understanding
MM Hossain, E Blanco - arXiv preprint arXiv:2210.14486, 2022 - arxiv.org
Negation poses a challenge in many natural language understanding tasks. Inspired by the
fact that understanding a negated statement often requires humans to infer affirmative …
fact that understanding a negated statement often requires humans to infer affirmative …
Principles from Clinical Research for NLP Model Generalization
The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …
assess generalization. Performance drops observed in datasets outside of official test sets …
A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks
We introduce an evaluation methodology for reading comprehension tasks based on the
intuition that certain examples, by the virtue of their linguistic complexity, consistently yield …
intuition that certain examples, by the virtue of their linguistic complexity, consistently yield …
Generalization of NLP Models: Notion and Causation
A Elangovan, J He, Y Li, K Verspoor - arXiv preprint arXiv:2311.03663, 2023 - arxiv.org
The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …
assess generalization. Performance drops observed in datasets outside of official test sets …