Inverse scaling can become u-shaped

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：337 相关文章所有 3 个版本

[PDF] mit.edu

Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu

Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

被引用次数：66 相关文章所有 7 个版本

[PDF] acm.org

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org

This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

被引用次数：483 相关文章所有 6 个版本

[PDF] arxiv.org

On second thought, let's not think step by step! Bias and toxicity in zero-shot reasoning

O Shaikh, H Zhang, W Held, M Bernstein… - arXiv preprint arXiv …, 2022 - arxiv.org

Generating a Chain of Thought (CoT) has been shown to consistently improve large
language model (LLM) performance on a wide range of NLP tasks. However, prior work has …

被引用次数：131 相关文章所有 5 个版本

[PDF] neurips.cc

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc

This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

被引用次数：48 相关文章所有 6 个版本

[PDF] arxiv.org

The cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning

S Kim, SJ Joo, D Kim, J Jang, S Ye, J Shin… - arXiv preprint arXiv …, 2023 - arxiv.org

Language models (LMs) with less than 100B parameters are known to perform poorly on
chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this …

被引用次数：54 相关文章所有 4 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：47 相关文章所有 3 个版本

[PDF] arxiv.org

Inverse scaling: When bigger isn't better

IR McKenzie, A Lyzhov, M Pieler, A Parrish… - arXiv preprint arXiv …, 2023 - arxiv.org

Work on scaling laws has found that large language models (LMs) show predictable
improvements to overall loss with increased scale (model size, training data, and compute) …

被引用次数：56 相关文章所有 3 个版本

[PDF] arxiv.org

Instruction-following evaluation through verbalizer manipulation

S Li, J Yan, H Wang, Z Tang, X Ren… - arXiv preprint arXiv …, 2023 - arxiv.org

While instruction-tuned models have shown remarkable success in various natural
language processing tasks, accurately evaluating their ability to follow instructions remains …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Language models are not naysayers: an analysis of language models on negation benchmarks

TH Truong, T Baldwin, K Verspoor, T Cohn - arXiv preprint arXiv …, 2023 - arxiv.org

Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …

被引用次数：35 相关文章所有 5 个版本