Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape

TR McIntosh, T Susnjak, T Liu, P Watters… - arXiv preprint arXiv …, 2023 - arxiv.org
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …

Gemini: a family of highly capable multimodal models

G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac… - arXiv preprint arXiv …, 2023 - arxiv.org
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

Generative language models and automated influence operations: Emerging threats and potential mitigations

JA Goldstein, G Sastry, M Musser, R DiResta… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative language models have improved drastically, and can now produce realistic text
outputs that are difficult to distinguish from human-written content. For malicious actors …

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …

Chain-of-thought prompting elicits reasoning in large language models

J Wei, X Wang, D Schuurmans… - Advances in neural …, 2022 - proceedings.neurips.cc
We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …

Lamda: Language models for dialog applications

R Thoppilan, D De Freitas, J Hall, N Shazeer… - arXiv preprint arXiv …, 2022 - arxiv.org
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …

Chain-of-verification reduces hallucination in large language models

S Dhuliawala, M Komeili, J Xu, R Raileanu, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Generation of plausible yet incorrect factual information, termed hallucination, is an
unsolved issue in large language models. We study the ability of language models to …

Halueval: A large-scale hallucination evaluation benchmark for large language models

J Li, X Cheng, WX Zhao, JY Nie, JR Wen - arXiv preprint arXiv:2305.11747, 2023 - arxiv.org
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, ie,
content that conflicts with the source or cannot be verified by the factual knowledge. To …