WMDO: Fluency-based word mover’s distance for machine translation evaluation

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

被引用次数：234 相关文章所有 4 个版本

[PDF] arxiv.org

Evaluation of text generation: A survey

A Celikyilmaz, E Clark, J Gao - arXiv preprint arXiv:2006.14799, 2020 - arxiv.org

The paper surveys evaluation methods of natural language generation (NLG) systems that
have been developed in the last few years. We group NLG evaluation methods into three …

被引用次数：366 相关文章所有 2 个版本

[PDF] arxiv.org

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arXiv preprint arXiv …, 2019 - arxiv.org

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

被引用次数：4620 相关文章所有 4 个版本

[PDF] dcu.ie

Results of the WMT19 metrics shared task: Segment-level and strong MT systems pose big challenges

Q Ma, JTZ Wei, O Bojar, Y Graham - 2019 - doras.dcu.ie

This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked
to score the outputs of the translations systems competing in the WMT19 News Translation …

被引用次数：203 相关文章所有 8 个版本

[PDF] arxiv.org

Automatic machine translation evaluation in many languages via zero-shot paraphrasing

B Thompson, M Post - arXiv preprint arXiv:2004.14564, 2020 - arxiv.org

We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …

被引用次数：176 相关文章所有 5 个版本

[PDF] arxiv.org

Biologically inspired design concept generation using generative pre-trained transformers

Q Zhu, X Zhang, J Luo - Journal of Mechanical …, 2023 - asmedigitalcollection.asme.org

Biological systems in nature have evolved for millions of years to adapt and survive the
environment. Many features they developed can be inspirational and beneficial for solving …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

Automatic text evaluation through the lens of Wasserstein barycenters

P Colombo, G Staerman, C Clavel… - arXiv preprint arXiv …, 2021 - arxiv.org

A new metric\texttt {BaryScore} to evaluate text generation based on deep contextualized
embeddings eg, BERT, Roberta, ELMo) is introduced. This metric is motivated by a new …

被引用次数：50 相关文章所有 12 个版本

[PDF] arxiv.org

Toward human-like evaluation for natural language generation with error analysis

Q Lu, L Ding, L Xie, K Zhang, DF Wong… - arXiv preprint arXiv …, 2022 - arxiv.org

The state-of-the-art language model-based automatic metrics, eg BARTScore, benefiting
from large-scale contextualized pre-training, have been successfully used in a wide range of …

被引用次数：25 相关文章所有 5 个版本

[PDF] aaai.org

Infolm: A new metric to evaluate summarization & data2text generation

PJA Colombo, C Clavel, P Piantanida - Proceedings of the AAAI …, 2022 - ojs.aaai.org

Assessing the quality of natural language generation (NLG) systems through human
annotation is very expensive. Additionally, human annotation campaigns are time …

被引用次数：46 相关文章所有 8 个版本

[PDF] arxiv.org

Perturbation CheckLists for evaluating NLG evaluation metrics

AB Sai, T Dixit, DY Sheth, S Mohan… - arXiv preprint arXiv …, 2021 - arxiv.org

Natural Language Generation (NLG) evaluation is a multifaceted task requiring assessment
of multiple desirable criteria, eg, fluency, coherency, coverage, relevance, adequacy, overall …

被引用次数：42 相关文章所有 5 个版本