Towards explainable evaluation metrics for machine translation
Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for
machine translation (for example, COMET or BERTScore) are based on black-box large …
machine translation (for example, COMET or BERTScore) are based on black-box large …
Machine translation meta evaluation through translation accuracy challenge sets
Recent machine translation (MT) metrics calibrate their effectiveness by correlating with
human judgment. However, these results are often obtained by averaging predictions across …
human judgment. However, these results are often obtained by averaging predictions across …
Segment-level evaluation of machine translation metrics
N Moghe - 2024 - era.ed.ac.uk
Most metrics evaluating Machine Translation (MT) claim their effectiveness by demonstrating
their ability to distinguish the quality of different MT systems over a large corpus (system …
their ability to distinguish the quality of different MT systems over a large corpus (system …