Marina Fomicheva, Chrysoula Zerva, Zhenhao Li, Vishrav Chaudhary, and André FT Martins....

R Rei, JGC De Souza, D Alves, C Zerva… - Proceedings of the …, 2022 - aclanthology.org

In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …

被引用次数：146 相关文章

[PDF] fbk.eu

Findings of the 2021 conference on machine translation (WMT21)

A Farhad, A Arkady, B Magdalena, B Ondřej… - Proceedings of the …, 2021 - cris.fbk.eu

This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

被引用次数：166 相关文章所有 19 个版本

[PDF] arxiv.org

Quality-aware decoding for neural machine translation

P Fernandes, A Farinhas, R Rei, JGC de Souza… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite the progress in machine translation quality estimation and evaluation in the last
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …

被引用次数：62 相关文章所有 7 个版本

[PDF] arxiv.org

UniTE: Unified translation evaluation

Y Wan, D Liu, B Yang, H Zhang, B Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

Translation quality evaluation plays a crucial role in machine translation. According to the
input format, it is mainly separated into three tasks, ie, reference-only, source-only and …

被引用次数：55 相关文章所有 6 个版本

[PDF] arxiv.org

Toward human-like evaluation for natural language generation with error analysis

Q Lu, L Ding, L Xie, K Zhang, DF Wong… - arXiv preprint arXiv …, 2022 - arxiv.org

The state-of-the-art language model-based automatic metrics, eg BARTScore, benefiting
from large-scale contextualized pre-training, have been successfully used in a wide range of …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

The eval4nlp 2023 shared task on prompting large language models as explainable metrics

C Leiter, J Opitz, D Deutsch, Y Gao, R Dror… - arXiv preprint arXiv …, 2023 - arxiv.org

With an increasing number of parameters and pre-training data, generative large language
models (LLMs) have shown remarkable capabilities to solve tasks with minimal or no task …

被引用次数：13 相关文章所有 5 个版本

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

On the limitations of reference-free evaluations of generated text

D Deutsch, R Dror, D Roth - arXiv preprint arXiv:2210.12563, 2022 - arxiv.org

There is significant interest in developing evaluation metrics which accurately estimate the
quality of generated text without the aid of a human-written reference text, which can be time …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Optimal transport for unsupervised hallucination detection in neural machine translation

NM Guerreiro, P Colombo, P Piantanida… - arXiv preprint arXiv …, 2022 - arxiv.org

Neural machine translation (NMT) has become the de-facto standard in real-world machine
translation applications. However, NMT models can unpredictably produce severely …

被引用次数：18 相关文章所有 9 个版本

[PDF] arxiv.org

Transformers go for the LOLs: Generating (humourous) titles from scientific abstracts end-to-end

Y Chen, S Eger - arXiv preprint arXiv:2212.10522, 2022 - arxiv.org

We consider the end-to-end abstract-to-title generation problem, exploring seven recent
transformer based models (including ChatGPT) fine-tuned on more than 30k abstract-title …

被引用次数：16 相关文章所有 4 个版本