Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H Jin, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org
This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

On second thought, let's not think step by step! Bias and toxicity in zero-shot reasoning

O Shaikh, H Zhang, W Held, M Bernstein… - arXiv preprint arXiv …, 2022 - arxiv.org
Generating a Chain of Thought (CoT) has been shown to consistently improve large
language model (LLM) performance on a wide range of NLP tasks. However, prior work has …

Revisiting out-of-distribution robustness in nlp: Benchmarks, analysis, and LLMs evaluations

L Yuan, Y Chen, G Cui, H Gao, F Zou… - Advances in …, 2023 - proceedings.neurips.cc
This paper reexamines the research on out-of-distribution (OOD) robustness in the field of
NLP. We find that the distribution shift settings in previous studies commonly lack adequate …

The cot collection: Improving zero-shot and few-shot learning of language models via chain-of-thought fine-tuning

S Kim, SJ Joo, D Kim, J Jang, S Ye, J Shin… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) with less than 100B parameters are known to perform poorly on
chain-of-thought (CoT) reasoning in contrast to large LMs when solving unseen tasks. In this …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Inverse scaling: When bigger isn't better

IR McKenzie, A Lyzhov, M Pieler, A Parrish… - arXiv preprint arXiv …, 2023 - arxiv.org
Work on scaling laws has found that large language models (LMs) show predictable
improvements to overall loss with increased scale (model size, training data, and compute) …

Instruction-following evaluation through verbalizer manipulation

S Li, J Yan, H Wang, Z Tang, X Ren… - arXiv preprint arXiv …, 2023 - arxiv.org
While instruction-tuned models have shown remarkable success in various natural
language processing tasks, accurately evaluating their ability to follow instructions remains …

Language models are not naysayers: an analysis of language models on negation benchmarks

TH Truong, T Baldwin, K Verspoor, T Cohn - arXiv preprint arXiv …, 2023 - arxiv.org
Negation has been shown to be a major bottleneck for masked language models, such as
BERT. However, whether this finding still holds for larger-sized auto-regressive language …