FactGraph: Evaluating factuality in summarization with semantic graph representations

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：151 相关文章所有 6 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：232 相关文章所有 4 个版本

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

被引用次数：16 相关文章

[PDF] arxiv.org

Zero-shot faithful factual error correction

KH Huang, HP Chan, H Ji - arXiv preprint arXiv:2305.07982, 2023 - arxiv.org

Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge
bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' …

被引用次数：34 相关文章所有 8 个版本

[PDF] arxiv.org

Factkb: Generalizable factuality evaluation using language models enhanced with factual knowledge

S Feng, V Balachandran, Y Bai, Y Tsvetkov - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the factual consistency of automatically generated summaries is essential for the
progress and adoption of reliable summarization systems. Despite recent advances, existing …

被引用次数：39 相关文章所有 4 个版本

[PDF] mlr.press

A meta-evaluation of faithfulness metrics for long-form hospital-course summarization

G Adams, J Zuckerg, N Elhadad - Machine Learning for …, 2023 - proceedings.mlr.press

Long-form clinical summarization of hospital admissions has real-world significance
because of its potential to help both clinicians and patients. The factual consistency of …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Faithfulness-aware decoding strategies for abstractive summarization

D Wan, M Liu, K McKeown, M Dreyer… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite significant progress in understanding and improving faithfulness in abstractive
summarization, the question of how decoding strategies affect faithfulness is less studied …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

How Far are We from Robust Long Abstractive Summarization?

HY Koh, J Ju, H Zhang, M Liu, S Pan - arXiv preprint arXiv:2210.16732, 2022 - arxiv.org

Abstractive summarization has made tremendous progress in recent years. In this work, we
perform fine-grained human annotations to evaluate long document abstractive …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

Interpretable automatic fine-grained inconsistency detection in text summarization

HP Chan, Q Zeng, H Ji - arXiv preprint arXiv:2305.14548, 2023 - arxiv.org

Existing factual consistency evaluation approaches for text summarization provide binary
predictions and limited insights into the weakness of summarization systems. Therefore, we …

被引用次数：12 相关文章所有 6 个版本

[PDF] aclanthology.org

Evaluate AMR graph similarity via self-supervised learning

Z Shou, F Lin - Proceedings of the 61st Annual Meeting of the …, 2023 - aclanthology.org

In work on AMR (Abstract Meaning Representation), similarity metrics are crucial as they are
used to evaluate AMR systems such as AMR parsers. Current AMR metrics are all based on …

被引用次数：6 相关文章所有 3 个版本