Towards question-answering as an automatic metric for evaluating the content quality of a summary

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：127 相关文章所有 6 个版本

[PDF] arxiv.org

Factscore: Fine-grained atomic evaluation of factual precision in long form text generation

S Min, K Krishna, X Lyu, M Lewis, W Yih… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the factuality of long-form text generated by large language models (LMs) is non-
trivial because (1) generations often contain a mixture of supported and unsupported pieces …

被引用次数：276 相关文章所有 8 个版本

[PDF] arxiv.org

News summarization and evaluation in the era of gpt-3

T Goyal, JJ Li, G Durrett - arXiv preprint arXiv:2209.12356, 2022 - arxiv.org

The recent success of zero-and few-shot prompting with models like GPT-3 has led to a
paradigm shift in NLP research. In this paper, we study its impact on text summarization …

被引用次数：311 相关文章所有 2 个版本

[PDF] arxiv.org

TRUE: Re-evaluating factual consistency evaluation

O Honovich, R Aharoni, J Herzig, H Taitelbaum… - arXiv preprint arXiv …, 2022 - arxiv.org

Grounded text generation systems often generate text that contains factual inconsistencies,
hindering their real-world applicability. Automatic factual consistency evaluation may help …

被引用次数：175 相关文章所有 10 个版本

[PDF] mit.edu

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

被引用次数：81 相关文章所有 10 个版本

[PDF] arxiv.org

QAFactEval: Improved QA-based factual consistency evaluation for summarization

AR Fabbri, CS Wu, W Liu, C Xiong - arXiv preprint arXiv:2112.08542, 2021 - arxiv.org

Factual consistency is an essential quality of text summarization models in practical settings.
Existing work in evaluating this dimension can be broadly categorized into two lines of …

被引用次数：144 相关文章所有 3 个版本

[PDF] arxiv.org

Improving faithfulness in abstractive summarization with contrast candidate generation and selection

S Chen, F Zhang, K Sone, D Roth - arXiv preprint arXiv:2104.09061, 2021 - arxiv.org

Despite significant progress in neural abstractive summarization, recent studies have shown
that the current models are prone to generating summaries that are unfaithful to the original …

被引用次数：100 相关文章所有 7 个版本

[PDF] arxiv.org

mface: Multilingual summarization with factual consistency evaluation

R Aharoni, S Narayan, J Maynez, J Herzig… - arXiv preprint arXiv …, 2022 - arxiv.org

Abstractive summarization has enjoyed renewed interest in recent years, thanks to pre-
trained language models and the availability of large-scale datasets. Despite promising …

被引用次数：39 相关文章所有 4 个版本

[PDF] arxiv.org

Zero-shot faithful factual error correction

KH Huang, HP Chan, H Ji - arXiv preprint arXiv:2305.07982, 2023 - arxiv.org

Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge
bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' …

被引用次数：28 相关文章所有 8 个版本

[PDF] mit.edu

Menli: Robust evaluation metrics from natural language inference

Y Chen, S Eger - Transactions of the Association for Computational …, 2023 - direct.mit.edu

Recently proposed BERT-based evaluation metrics for text generation perform well on
standard benchmarks but are vulnerable to adversarial attacks, eg, relating to information …

被引用次数：32 相关文章所有 8 个版本