Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization

E Durmus, H He, M Diab - arXiv preprint arXiv:2005.03754, 2020 - arxiv.org
Neural abstractive summarization models are prone to generate content inconsistent with
the source document, ie unfaithful. Existing automatic metrics do not capture such mistakes …

QuestEval: Summarization asks for fact-based evaluation

T Scialom, PA Dray, P Gallinari, S Lamprier… - arXiv preprint arXiv …, 2021 - arxiv.org
Summarization evaluation remains an open research problem: current metrics such as
ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate …

Answers unite! unsupervised metrics for reinforced summarization models

T Scialom, S Lamprier, B Piwowarski… - arXiv preprint arXiv …, 2019 - arxiv.org
Abstractive summarization approaches based on Reinforcement Learning (RL) have
recently been proposed to overcome classical likelihood maximization. RL enables to …

A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding

DO Cajueiro, AG Nery, I Tavares, MK De Melo… - arXiv preprint arXiv …, 2023 - arxiv.org
We provide a literature review about Automatic Text Summarization (ATS) systems. We
consider a citation-based approach. We start with some popular and well-known papers that …

Fill in the BLANC: Human-free quality estimation of document summaries

O Vasilyev, V Dharnidharka, J Bohannon - arXiv preprint arXiv …, 2020 - arxiv.org
We present BLANC, a new approach to the automatic estimation of document summary
quality. Our goal is to measure the functional performance of a summary with an objective …

Factual consistency evaluation for text summarization via counterfactual estimation

Y Xie, F Sun, Y Deng, Y Li, B Ding - arXiv preprint arXiv:2108.13134, 2021 - arxiv.org
Despite significant progress has been achieved in text summarization, factual inconsistency
in generated summaries still severely limits its practical applications. Among the key factors …

MQAG: Multiple-choice question answering and generation for assessing information consistency in summarization

P Manakul, A Liusie, MJF Gales - arXiv preprint arXiv:2301.12307, 2023 - arxiv.org
State-of-the-art summarization systems can generate highly fluent summaries. These
summaries, however, may contain factual inconsistencies and/or information not present in …

Document processing: Methods for semantic text similarity analysis

AW Qurashi, V Holmes… - … on INnovations in …, 2020 - ieeexplore.ieee.org
The document text similarity measurement and analysis is a growing application of Natural
Language Processing. This paper presents the results of using different techniques for …

Unsupervised reference-free summary quality evaluation via contrastive learning

H Wu, T Ma, L Wu, T Manyumwa, S Ji - arXiv preprint arXiv:2010.01781, 2020 - arxiv.org
Evaluation of a document summarization system has been a critical factor to impact the
success of the summarization task. Previous approaches, such as ROUGE, mainly consider …