- 学术资源搜索

COMET-22: Unbabel-IST 2022 submission for the metrics shared task

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

被引用次数：212 相关文章

[PDF] aclanthology.org

Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain

M Freitag, R Rei, N Mathur, C Lo… - Proceedings of the …, 2021 - aclanthology.org

This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT21 News Translation …

被引用次数：171 相关文章所有 8 个版本

[PDF] arxiv.org

Large language models effectively leverage document-level context for literary translation, but critical errors persist

M Karpinska, M Iyyer - arXiv preprint arXiv:2304.03245, 2023 - arxiv.org

Large language models (LLMs) are competitive with the state of the art on a wide range of
sentence-level translation datasets. However, their ability to translate paragraphs and …

被引用次数：83 相关文章所有 6 个版本

[PDF] arxiv.org

Quality-aware decoding for neural machine translation

P Fernandes, A Farinhas, R Rei, JGC de Souza… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite the progress in machine translation quality estimation and evaluation in the last
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …

被引用次数：79 相关文章所有 7 个版本

[PDF] mit.edu

High quality rather than high model probability: Minimum Bayes risk decoding with neural metrics

M Freitag, D Grangier, Q Tan, B Liang - Transactions of the …, 2022 - direct.mit.edu

Abstract In Neural Machine Translation, it is typically assumed that the sentence with the
highest estimated probability should also be the translation with the highest quality as …

被引用次数：67 相关文章所有 10 个版本

[PDF] arxiv.org

Codescore: Evaluating code generation by learning code execution

Y Dong, J Ding, X Jiang, G Li, Z Li, Z Jin - arXiv preprint arXiv:2301.09043, 2023 - arxiv.org

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation,
which is an important research field in NLP and software engineering. Prevailing match …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

Identifying weaknesses in machine translation metrics through minimum Bayes risk decoding: A case study for COMET

C Amrhein, R Sennrich - arXiv preprint arXiv:2202.05148, 2022 - arxiv.org

Neural metrics have achieved impressive correlation with human judgements in the
evaluation of machine translation systems, but before we can safely optimise towards such …

被引用次数：48 相关文章所有 5 个版本

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

GEMBA-MQM: Detecting translation quality error spans with GPT-4

T Kocmi, C Federmann - arXiv preprint arXiv:2310.13988, 2023 - arxiv.org

This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect
translation quality errors, specifically for the quality estimation setting without the need for …

被引用次数：44 相关文章所有 5 个版本

[PDF] arxiv.org

On the evaluation metrics for paraphrase generation

L Shen, L Liu, H Jiang, S Shi - arXiv preprint arXiv:2202.08479, 2022 - arxiv.org

In this paper we revisit automatic metrics for paraphrase evaluation and obtain two findings
that disobey conventional wisdom:(1) Reference-free metrics achieve better performance …

被引用次数：40 相关文章所有 3 个版本