Measuring attribution in natural language generation models

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：124 相关文章所有 6 个版本

[PDF] arxiv.org

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org

This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

被引用次数：78 相关文章所有 3 个版本

[PDF] arxiv.org

Gemini: a family of highly capable multimodal models

G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac… - arXiv preprint arXiv …, 2023 - arxiv.org

This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …

被引用次数：1211 相关文章所有 2 个版本

[PDF] arxiv.org

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

被引用次数：258 相关文章所有 8 个版本

[PDF] arxiv.org

Generative language models and automated influence operations: Emerging threats and potential mitigations

JA Goldstein, G Sastry, M Musser, R DiResta… - arXiv preprint arXiv …, 2023 - arxiv.org

Generative language models have improved drastically, and can now produce realistic text
outputs that are difficult to distinguish from human-written content. For malicious actors …

被引用次数：209 相关文章所有 3 个版本

[PDF] arxiv.org

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org

In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

被引用次数：241 相关文章所有 4 个版本

[PDF] neurips.cc

Chain-of-thought prompting elicits reasoning in large language models

J Wei, X Wang, D Schuurmans… - Advances in neural …, 2022 - proceedings.neurips.cc

We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …

被引用次数：6589 相关文章所有 17 个版本

[PDF] arxiv.org

Lamda: Language models for dialog applications

R Thoppilan, D De Freitas, J Hall, N Shazeer… - arXiv preprint arXiv …, 2022 - arxiv.org

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

被引用次数：1327 相关文章所有 6 个版本

[PDF] arxiv.org

Chain-of-verification reduces hallucination in large language models

S Dhuliawala, M Komeili, J Xu, R Raileanu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Generation of plausible yet incorrect factual information, termed hallucination, is an
unsolved issue in large language models. We study the ability of language models to …

被引用次数：173 相关文章所有 4 个版本

[PDF] arxiv.org

Halueval: A large-scale hallucination evaluation benchmark for large language models

J Li, X Cheng, WX Zhao, JY Nie, JR Wen - arXiv preprint arXiv:2305.11747, 2023 - arxiv.org

Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, ie,
content that conflicts with the source or cannot be verified by the factual knowledge. To …

被引用次数：198 相关文章所有 4 个版本