Not another negation benchmark: The NaN-NLI test suite for sub-clausal negation

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu

Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

被引用次数：74 相关文章所有 7 个版本

[PDF] arxiv.org

Benchmarks for automated commonsense reasoning: A survey

E Davis - ACM Computing Surveys, 2023 - dl.acm.org

More than one hundred benchmarks have been developed to test the commonsense
knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems …

被引用次数：48 相关文章所有 4 个版本

[PDF] arxiv.org

Language models are not naysayers: an analysis of language models on negation benchmarks

TH Truong, T Baldwin, K Verspoor, T Cohn - arXiv preprint arXiv …, 2023 - arxiv.org

Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …

被引用次数：42 相关文章所有 5 个版本

[PDF] ucl.ac.uk

Using natural language explanations to improve robustness of in-context learning

X He, Y Wu, OM Camburu, P Minervini… - Proceedings of the …, 2024 - discovery.ucl.ac.uk

Recent studies demonstrated that large language models (LLMs) can excel in many tasks
via in-context learning (ICL). However, recent works show that ICL-prompted models tend to …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

This is not a dataset: A large negation benchmark to challenge large language models

I García-Ferrero, B Altuna, J Álvez… - arXiv preprint arXiv …, 2023 - arxiv.org

Although large language models (LLMs) have apparently acquired a certain level of
grammatical knowledge and the ability to make generalizations, they fail to interpret …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Glore: Evaluating logical reasoning of large language models

Z Teng, R Ning, J Liu, Q Zhou, Y Zhang - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, large language models (LLMs), including notable models such as GPT-4 and
burgeoning community models, have showcased significant general language …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Leveraging affirmative interpretations from negation improves natural language understanding

MM Hossain, E Blanco - arXiv preprint arXiv:2210.14486, 2022 - arxiv.org

Negation poses a challenge in many natural language understanding tasks. Inspired by the
fact that understanding a negated statement often requires humans to infer affirmative …

被引用次数：5 相关文章所有 4 个版本

[PDF] aclanthology.org

Principles from Clinical Research for NLP Model Generalization

A Elangovan, J He, Y Li, K Verspoor - Proceedings of the 2024 …, 2024 - aclanthology.org

The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …

A linguistically-motivated evaluation methodology for unraveling model's abilities in reading comprehension tasks

E Antoine, F Bechet, G Damnati… - Proceedings of the 2024 …, 2024 - aclanthology.org

We introduce an evaluation methodology for reading comprehension tasks based on the
intuition that certain examples, by the virtue of their linguistic complexity, consistently yield …

[PDF] arxiv.org

Generalization of NLP Models: Notion and Causation

A Elangovan, J He, Y Li, K Verspoor - arXiv preprint arXiv:2311.03663, 2023 - arxiv.org

The NLP community typically relies on performance of a model on a held-out test set to
assess generalization. Performance drops observed in datasets outside of official test sets …