Multidimensional quality metrics: a flexible system for assessing translation quality

Z He, T Liang, W Jiao, Z Zhang, Y Yang… - Transactions of the …, 2024 - direct.mit.edu

Large language models (LLMs) have demonstrated impressive capabilities in general
scenarios, exhibiting a level of aptitude that approaches, in some aspects even surpasses …

被引用次数：77 相关文章所有 9 个版本

[PDF] aclanthology.org

Findings of the WMT 2023 shared task on quality estimation

F Blain, C Zerva, R Rei, NM Guerreiro… - Proceedings of the …, 2023 - aclanthology.org

We report the results of the WMT 2023 shared task on Quality Estimation, in which the
challenge is to predict the quality of the output of neural machine translation systems at the …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Assessing large language models on climate information

J Bulian, MS Schäfer, A Amini, H Lam… - arXiv preprint arXiv …, 2023 - arxiv.org

As Large Language Models (LLMs) rise in popularity, it is necessary to assess their
capability in critically relevant domains. We present a comprehensive evaluation framework …

被引用次数：18 相关文章所有 9 个版本

[PDF] arxiv.org

Adapting large language models for document-level machine translation

M Wu, TT Vu, L Qu, G Foster, G Haffari - arXiv preprint arXiv:2401.06468, 2024 - arxiv.org

Large language models (LLMs) have made significant strides in various natural language
processing (NLP) tasks. Recent research shows that the moderately-sized LLMs often …

被引用次数：25 相关文章所有 2 个版本

[HTML] intechopen.com

Machine translation and the evaluation of its quality

MS Maučec, G Donaj - Recent trends in computational …, 2019 - books.google.com

Abstract Machine translation has already become part of our everyday life. This chapter
gives an overview of machine translation approaches. Statistical machine translation was a …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Bhasa: A holistic southeast asian linguistic and cultural evaluation suite for large language models

WQ Leong, JG Ngui, Y Susanto, H Rengarajan… - arXiv preprint arXiv …, 2023 - arxiv.org

The rapid development of Large Language Models (LLMs) and the emergence of novel
abilities with scale have necessitated the construction of holistic, diverse and challenging …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

DHP Benchmark: Are LLMs Good NLG Evaluators?

Y Wang, J Yuan, YN Chuang, Z Wang, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are increasingly serving as evaluators in Natural Language
Generation (NLG) tasks. However, the capabilities of LLMs in scoring NLG quality remain …

被引用次数：4 相关文章所有 3 个版本

[PDF] aclanthology.org

Metric score landscape challenge (MSLC23): Understanding metrics' performance on a wider landscape of translation quality

C Lo, S Larkin, R Knowles - … of the Eighth Conference on Machine …, 2023 - aclanthology.org

Abstract The Metric Score Landscape Challenge (MSLC23) dataset aims to gain insight into
metric scores on a broader/wider landscape of machine translation (MT) quality. It provides a …

被引用次数：6 相关文章所有 3 个版本

Multi-view fusion for universal translation quality estimation

H Huang, S Wu, K Chen, X Liang, H Di, M Yang… - Information …, 2024 - Elsevier

Abstract Machine translation quality estimation (QE) aims to evaluate the result of translation
without reference. Despite the progress it has made, state-of-the-art QE models are proven …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

MT-Ranker: Reference-free machine translation evaluation by inter-system ranking

IM Moosa, R Zhang, W Yin - arXiv preprint arXiv:2401.17099, 2024 - arxiv.org

Traditionally, Machine Translation (MT) Evaluation has been treated as a regression
problem--producing an absolute translation-quality score. This approach has two limitations …

被引用次数：4 相关文章所有 3 个版本