Anticipating safety issues in e2e conversational ai: Framework and tooling

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：133 相关文章所有 6 个版本

[PDF] arxiv.org

Llama 2: Open foundation and fine-tuned chat models

H Touvron, L Martin, K Stone, P Albert… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …

被引用次数：8266 相关文章所有 2 个版本

[PDF] acm.org

Taxonomy of risks posed by language models

L Weidinger, J Uesato, M Rauh, C Griffin… - Proceedings of the …, 2022 - dl.acm.org

Responsible innovation on large-scale Language Models (LMs) requires foresight into and
in-depth understanding of the risks these models may pose. This paper develops a …

被引用次数：465 相关文章所有 7 个版本

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team … - Science, 2022 - science.org

Despite much progress in training artificial intelligence (AI) systems to imitate human
language, building agents that use language to communicate intentionally with humans in …

被引用次数：202 相关文章所有 4 个版本

[PDF] arxiv.org

Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai… - arXiv preprint arXiv …, 2022 - arxiv.org

We describe our early efforts to red team language models in order to simultaneously
discover, measure, and attempt to reduce their potentially harmful outputs. We make three …

被引用次数：348 相关文章所有 2 个版本

[PDF] arxiv.org

Lamda: Language models for dialog applications

R Thoppilan, D De Freitas, J Hall, N Shazeer… - arXiv preprint arXiv …, 2022 - arxiv.org

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

被引用次数：1412 相关文章所有 6 个版本

[PDF] frontiersin.org

A review of the explainability and safety of conversational agents for mental health to identify avenues for improvement

S Sarkar, M Gaur, LK Chen, M Garg… - Frontiers in Artificial …, 2023 - frontiersin.org

Virtual Mental Health Assistants (VMHAs) continuously evolve to support the overloaded
global healthcare system, which receives approximately 60 million primary care visits and 6 …

被引用次数：13 相关文章所有 5 个版本

[PDF] arxiv.org

The capacity for moral self-correction in large language models

D Ganguli, A Askell, N Schiefer, TI Liao… - arXiv preprint arXiv …, 2023 - arxiv.org

We test the hypothesis that language models trained with reinforcement learning from
human feedback (RLHF) have the capability to" morally self-correct"--to avoid producing …

被引用次数：124 相关文章所有 2 个版本

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

被引用次数：3758 相关文章所有 2 个版本

[PDF] acm.org

Predictability and surprise in large generative models

D Ganguli, D Hernandez, L Lovitt, A Askell… - Proceedings of the …, 2022 - dl.acm.org

Large-scale pre-training has recently emerged as a technique for creating capable, general-
purpose, generative models such as GPT-3, Megatron-Turing NLG, Gopher, and many …

被引用次数：250 相关文章所有 6 个版本