TransQuest: Translation quality estimation with cross-lingual transformers

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

被引用次数：70 相关文章所有 10 个版本

[PDF] arxiv.org

Quality-aware decoding for neural machine translation

P Fernandes, A Farinhas, R Rei, JGC de Souza… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite the progress in machine translation quality estimation and evaluation in the last
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …

被引用次数：61 相关文章所有 7 个版本

[PDF] mit.edu

Understanding and detecting hallucinations in neural machine translation via model introspection

W Xu, S Agrawal, E Briakou, MJ Martindale… - Transactions of the …, 2023 - direct.mit.edu

Neural sequence generation models are known to “hallucinate”, by producing outputs that
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …

被引用次数：31 相关文章所有 8 个版本

[PDF] arxiv.org

UniTE: Unified translation evaluation

Y Wan, D Liu, B Yang, H Zhang, B Chen… - arXiv preprint arXiv …, 2022 - arxiv.org

Translation quality evaluation plays a crucial role in machine translation. According to the
input format, it is mainly separated into three tasks, ie, reference-only, source-only and …

被引用次数：54 相关文章所有 6 个版本

[PDF] mit.edu

Menli: Robust evaluation metrics from natural language inference

Y Chen, S Eger - Transactions of the Association for Computational …, 2023 - direct.mit.edu

Recently proposed BERT-based evaluation metrics for text generation perform well on
standard benchmarks but are vulnerable to adversarial attacks, eg, relating to information …

被引用次数：29 相关文章所有 8 个版本

[PDF] aclanthology.org

Findings of the WMT 2021 shared task on quality estimation

L Specia, F Blain, M Fomicheva, C Zerva… - Proceedings of the …, 2021 - aclanthology.org

We report the results of the WMT 2021 shared task on Quality Estimation, where the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：137 相关文章所有 12 个版本

[PDF] arxiv.org

Codescore: Evaluating code generation by learning code execution

Y Dong, J Ding, X Jiang, G Li, Z Li, Z Jin - arXiv preprint arXiv:2301.09043, 2023 - arxiv.org

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation,
which is an important research field in NLP and software engineering. Prevailing match …

被引用次数：25 相关文章所有 3 个版本

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

The inside story: Towards better understanding of machine translation neural evaluation metrics

R Rei, NM Guerreiro, M Treviso, L Coheur… - arXiv preprint arXiv …, 2023 - arxiv.org

Neural metrics for machine translation evaluation, such as COMET, exhibit significant
improvements in their correlation with human judgments, as compared to traditional metrics …

被引用次数：13 相关文章所有 5 个版本

[PDF] jmlr.org

Towards explainable evaluation metrics for machine translation

C Leiter, P Lertvittayakumjorn, M Fomicheva… - Journal of Machine …, 2024 - jmlr.org

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for
machine translation (for example, COMET or BERTScore) are based on black-box large …

被引用次数：7 相关文章所有 5 个版本