Interactive natural language processing

Z Wang, G Zhang, K Yang, N Shi, W Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within
the field of NLP, aimed at addressing limitations in existing frameworks while aligning with …

Towards explainable evaluation metrics for machine translation

C Leiter, P Lertvittayakumjorn, M Fomicheva… - Journal of Machine …, 2024 - jmlr.org
Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for
machine translation (for example, COMET or BERTScore) are based on black-box large …

Characterizing the Confidence of Large Language Model-Based Automatic Evaluation Metrics

R Stureborg, D Alikaniotis, Y Suhara - Proceedings of the 18th …, 2024 - aclanthology.org
There has recently been a growing interest in using Large Language Models (LLMs) to
evaluate NLP tasks automatically. Considerable research effort has been put into improving …

Tailoring domain adaptation for machine translation quality estimation

JPR Sharami, D Shterionov, F Blain… - arXiv preprint arXiv …, 2023 - arxiv.org
While quality estimation (QE) can play an important role in the translation process, its
effectiveness relies on the availability and quality of training data. For QE in particular, high …

Predicting Machine Translation Performance on Low-Resource Languages: The Role of Domain Similarity

E Khiu, H Toossi, D Anugraha, J Liu, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuning and testing a multilingual large language model is expensive and challenging
for low-resource languages (LRLs). While previous studies have predicted the performance …

Fine-tuned machine translation metrics struggle in unseen domains

V Zouhar, S Ding, A Currey, T Badeka, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce a new, extensive multidimensional quality metrics (MQM) annotated dataset
covering 11 language pairs in the biomedical domain. We use this dataset to investigate …