Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Evaluating large language models at evaluating instruction following

Z Zeng, J Yu, T Gao, Y Meng, T Goyal… - arXiv preprint arXiv …, 2023 - arxiv.org
As research in large language models (LLMs) continues to accelerate, LLM-based
evaluation has emerged as a scalable and cost-effective alternative to human evaluations …

Say what you mean! large language models speak too positively about negative commonsense knowledge

J Chen, W Shi, Z Fu, S Cheng, L Li, Y Xiao - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have been widely studied for their ability to store and utilize
positive knowledge. However, negative knowledge, such as" lions don't live in the ocean", is …

Language models are not naysayers: an analysis of language models on negation benchmarks

TH Truong, T Baldwin, K Verspoor, T Cohn - arXiv preprint arXiv …, 2023 - arxiv.org
Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …

Natural language processing in marketing

J Hartmann, O Netzer - Artificial intelligence in marketing, 2023 - emerald.com
The increasing importance and proliferation of text data provide a unique opportunity and
novel lens to study human communication across a myriad of business and marketing …

ScoNe: Benchmarking negation reasoning in language models with fine-tuning and in-context learning

JS She, C Potts, SR Bowman, A Geiger - arXiv preprint arXiv:2305.19426, 2023 - arxiv.org
A number of recent benchmarks seek to assess how well models handle natural language
negation. However, these benchmarks lack the controlled example paradigms that would …

On the limitations of dataset balancing: The lost battle against spurious correlations

R Schwartz, G Stanovsky - arXiv preprint arXiv:2204.12708, 2022 - arxiv.org
Recent work has shown that deep learning models in NLP are highly sensitive to low-level
correlations between simple features and specific output labels, leading to overfitting and …

This is not a dataset: A large negation benchmark to challenge large language models

I García-Ferrero, B Altuna, J Álvez… - arXiv preprint arXiv …, 2023 - arxiv.org
Although large language models (LLMs) have apparently acquired a certain level of
grammatical knowledge and the ability to make generalizations, they fail to interpret …

Exploring lottery prompts for pre-trained language models

Y Chen, N Ding, X Wang, S Hu, HT Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Consistently scaling pre-trained language models (PLMs) imposes substantial burdens on
model adaptation, necessitating more efficient alternatives to conventional fine-tuning. Given …

Not another negation benchmark: The NaN-NLI test suite for sub-clausal negation

TH Truong, Y Otmakhova, T Baldwin, T Cohn… - arXiv preprint arXiv …, 2022 - arxiv.org
Negation is poorly captured by current language models, although the extent of this problem
is not widely understood. We introduce a natural language inference (NLI) test suite to …