Towards explainable evaluation metrics for machine translation

C Leiter, P Lertvittayakumjorn, M Fomicheva… - Journal of Machine …, 2024 - jmlr.org
Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for
machine translation (for example, COMET or BERTScore) are based on black-box large …

Machine translation meta evaluation through translation accuracy challenge sets

N Moghe, A Fazla, C Amrhein, T Kocmi… - Computational …, 2024 - direct.mit.edu
Recent machine translation (MT) metrics calibrate their effectiveness by correlating with
human judgment. However, these results are often obtained by averaging predictions across …

Segment-level evaluation of machine translation metrics

N Moghe - 2024 - era.ed.ac.uk
Most metrics evaluating Machine Translation (MT) claim their effectiveness by demonstrating
their ability to distinguish the quality of different MT systems over a large corpus (system …