Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac… - arXiv preprint arXiv …, 2023 - arxiv.org
This report introduces a new family of multimodal models, Gemini, that exhibit remarkable
capabilities across image, audio, video, and text understanding. The Gemini family consists …
capabilities across image, audio, video, and text understanding. The Gemini family consists …
Factscore: Fine-grained atomic evaluation of factual precision in long form text generation
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …
trivial because (1) generations often contain a mixture of supported and unsupported pieces …
Generative language models and automated influence operations: Emerging threats and potential mitigations
Generative language models have improved drastically, and can now produce realistic text
outputs that are difficult to distinguish from human-written content. For malicious actors …
outputs that are difficult to distinguish from human-written content. For malicious actors …
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
M Reid, N Savinov, D Teplyashin, D Lepikhin… - arXiv preprint arXiv …, 2024 - arxiv.org
In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …
compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning …
Chain-of-thought prompting elicits reasoning in large language models
We explore how generating a chain of thought---a series of intermediate reasoning steps---
significantly improves the ability of large language models to perform complex reasoning. In …
significantly improves the ability of large language models to perform complex reasoning. In …
Lamda: Language models for dialog applications
We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of
Transformer-based neural language models specialized for dialog, which have up to 137B …
Transformer-based neural language models specialized for dialog, which have up to 137B …
Chain-of-verification reduces hallucination in large language models
Generation of plausible yet incorrect factual information, termed hallucination, is an
unsolved issue in large language models. We study the ability of language models to …
unsolved issue in large language models. We study the ability of language models to …
Halueval: A large-scale hallucination evaluation benchmark for large language models
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, ie,
content that conflicts with the source or cannot be verified by the factual knowledge. To …
content that conflicts with the source or cannot be verified by the factual knowledge. To …