Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Llama 2: Open foundation and fine-tuned chat models

H Touvron, L Martin, K Stone, P Albert… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …

Taxonomy of risks posed by language models

L Weidinger, J Uesato, M Rauh, C Griffin… - Proceedings of the …, 2022 - dl.acm.org
Responsible innovation on large-scale Language Models (LMs) requires foresight into and
in-depth understanding of the risks these models may pose. This paper develops a …

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team … - Science, 2022 - science.org
Despite much progress in training artificial intelligence (AI) systems to imitate human
language, building agents that use language to communicate intentionally with humans in …

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai… - arXiv preprint arXiv …, 2022 - arxiv.org
We describe our early efforts to red team language models in order to simultaneously
discover, measure, and attempt to reduce their potentially harmful outputs. We make three …

Lamda: Language models for dialog applications

R Thoppilan, D De Freitas, J Hall, N Shazeer… - arXiv preprint arXiv …, 2022 - arxiv.org
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

A review of the explainability and safety of conversational agents for mental health to identify avenues for improvement

S Sarkar, M Gaur, LK Chen, M Garg… - Frontiers in Artificial …, 2023 - frontiersin.org
Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded
global healthcare system, which receives approximately 60 million primary care visits and 6 …

The capacity for moral self-correction in large language models

D Ganguli, A Askell, N Schiefer, TI Liao… - arXiv preprint arXiv …, 2023 - arxiv.org
We test the hypothesis that language models trained with reinforcement learning from
human feedback (RLHF) have the capability to" morally self-correct"--to avoid producing …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Predictability and surprise in large generative models

D Ganguli, D Hernandez, L Lovitt, A Askell… - Proceedings of the …, 2022 - dl.acm.org
Large-scale pre-training has recently emerged as a technique for creating capable, general-
purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many …