Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

A survey on automated fact-checking

Z Guo, M Schlichtkrull, A Vlachos - Transactions of the Association for …, 2022 - direct.mit.edu
Fact-checking has become increasingly important due to the speed with which both
information and misinformation can spread in the modern media ecosystem. Therefore …

Self-critiquing models for assisting human evaluators

W Saunders, C Yeh, J Wu, S Bills, L Ouyang… - arXiv preprint arXiv …, 2022 - arxiv.org
We fine-tune large language models to write natural language critiques (natural language
critical comments) using behavioral cloning. On a topic-based summarization task, critiques …

Rarr: Researching and revising what language models say, using language models

L Gao, Z Dai, P Pasupat, A Chen, AT Chaganty… - arXiv preprint arXiv …, 2022 - arxiv.org
Language models (LMs) now excel at many tasks such as few-shot learning, question
answering, reasoning, and dialog. However, they sometimes generate unsupported or …

Internet-augmented language models through few-shot prompting for open-domain question answering

A Lazaridou, E Gribovskaya, W Stokowiec… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we aim to capitalize on the unique few-shot capabilities of large-scale language
models (LSLMs) to overcome some of their challenges with respect to grounding to factual …

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arXiv preprint arXiv …, 2021 - arxiv.org
A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

Automated fact-checking for assisting human fact-checkers

P Nakov, D Corney, M Hasanain, F Alam… - arXiv preprint arXiv …, 2021 - arxiv.org
The reporting and the analysis of current events around the globe has expanded from
professional, editor-lead journalism all the way to citizen journalism. Nowadays, politicians …

LongEval: Guidelines for human evaluation of faithfulness in long-form summarization

K Krishna, E Bransom, B Kuehl, M Iyyer… - arXiv preprint arXiv …, 2023 - arxiv.org
While human evaluation remains best practice for accurately judging the faithfulness of
automatically-generated summaries, few solutions exist to address the increased difficulty …

The state of human-centered NLP technology for fact-checking

A Das, H Liu, V Kovatchev, M Lease - Information processing & …, 2023 - Elsevier
Misinformation threatens modern society by promoting distrust in science, changing
narratives in public health, heightening social polarization, and disrupting democratic …

Evidence-based fact-checking of health-related claims

M Sarrouti, AB Abacha, Y M'rabet… - Findings of the …, 2021 - aclanthology.org
The task of verifying the truthfulness of claims in textual documents, or fact-checking, has
received significant attention in recent years. Many existing evidence-based factchecking …