Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization
Neural abstractive summarization models are prone to generate content inconsistent with
the source document, ie unfaithful. Existing automatic metrics do not capture such mistakes …
the source document, ie unfaithful. Existing automatic metrics do not capture such mistakes …
QuestEval: Summarization asks for fact-based evaluation
Summarization evaluation remains an open research problem: current metrics such as
ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate …
ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate …
Answers unite! unsupervised metrics for reinforced summarization models
Abstractive summarization approaches based on Reinforcement Learning (RL) have
recently been proposed to overcome classical likelihood maximization. RL enables to …
recently been proposed to overcome classical likelihood maximization. RL enables to …
A comprehensive review of automatic text summarization techniques: method, data, evaluation and coding
We provide a literature review about Automatic Text Summarization (ATS) systems. We
consider a citation-based approach. We start with some popular and well-known papers that …
consider a citation-based approach. We start with some popular and well-known papers that …
Fill in the BLANC: Human-free quality estimation of document summaries
O Vasilyev, V Dharnidharka, J Bohannon - arXiv preprint arXiv …, 2020 - arxiv.org
We present BLANC, a new approach to the automatic estimation of document summary
quality. Our goal is to measure the functional performance of a summary with an objective …
quality. Our goal is to measure the functional performance of a summary with an objective …
Factual consistency evaluation for text summarization via counterfactual estimation
Despite significant progress has been achieved in text summarization, factual inconsistency
in generated summaries still severely limits its practical applications. Among the key factors …
in generated summaries still severely limits its practical applications. Among the key factors …
MQAG: Multiple-choice question answering and generation for assessing information consistency in summarization
State-of-the-art summarization systems can generate highly fluent summaries. These
summaries, however, may contain factual inconsistencies and/or information not present in …
summaries, however, may contain factual inconsistencies and/or information not present in …
Document processing: Methods for semantic text similarity analysis
AW Qurashi, V Holmes… - … on INnovations in …, 2020 - ieeexplore.ieee.org
The document text similarity measurement and analysis is a growing application of Natural
Language Processing. This paper presents the results of using different techniques for …
Language Processing. This paper presents the results of using different techniques for …
Unsupervised reference-free summary quality evaluation via contrastive learning
Evaluation of a document summarization system has been a critical factor to impact the
success of the summarization task. Previous approaches, such as ROUGE, mainly consider …
success of the summarization task. Previous approaches, such as ROUGE, mainly consider …