Quality-aware decoding for neural machine translation

P Fernandes, A Farinhas, R Rei, JGC de Souza… - arXiv preprint arXiv …, 2022 - arxiv.org
Despite the progress in machine translation quality estimation and evaluation in the last
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …

The devil is in the errors: Leveraging large language models for fine-grained machine translation evaluation

P Fernandes, D Deutsch, M Finkelstein, P Riley… - arXiv preprint arXiv …, 2023 - arxiv.org
Automatic evaluation of machine translation (MT) is a critical tool driving the rapid iterative
development of MT systems. While considerable progress has been made on estimating a …

Findings of the WMT 2021 shared task on quality estimation

L Specia, F Blain, M Fomicheva, C Zerva… - Proceedings of the …, 2021 - aclanthology.org
We report the results of the WMT 2021 shared task on Quality Estimation, where the
challenge is to predict the quality of the output of neural machine translation systems at the …

On early detection of hallucinations in factual question answering

B Snyder, M Moisescu, MB Zafar - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
While large language models (LLMs) have taken great strides towards helping humans with
a plethora of tasks, hallucinations remain a major impediment towards gaining user trust …

MLQE-PE: A multilingual quality estimation and post-editing dataset

M Fomicheva, S Sun, E Fonseca, C Zerva… - arXiv preprint arXiv …, 2020 - arxiv.org
We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE)
and Automatic Post-Editing (APE). The dataset contains eleven language pairs, with human …

From Handcrafted Features to LLMs: A Brief Survey for Machine Translation Quality Estimation

H Zhao, Y Liu, S Tao, W Meng, Y Chen, X Geng… - arXiv preprint arXiv …, 2024 - arxiv.org
Machine Translation Quality Estimation (MTQE) is the task of estimating the quality of
machine-translated text in real time without the need for reference translations, which is of …

Simple LLM prompting is state-of-the-art for robust and multilingual dialogue evaluation

J Mendonça, P Pereira, H Moniz, JP Carvalho… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite significant research effort in the development of automatic dialogue evaluation
metrics, little thought is given to evaluating dialogues other than in English. At the same time …

Fake artificial intelligence generated contents (faigc): A survey of theories, detection methods, and opportunities

X Yu, Y Wang, Y Chen, Z Tao, D Xi, S Song… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, generative artificial intelligence models, represented by Large Language
Models (LLMs) and Diffusion Models (DMs), have revolutionized content production …

IST-unbabel 2021 submission for the explainable quality estimation shared task

M Treviso, NM Guerreiro, R Rei… - Proceedings of the 2nd …, 2021 - aclanthology.org
We present the joint contribution of Instituto Superior Técnico (IST) and Unbabel to the
Explainable Quality Estimation (QE) shared task, where systems were submitted to two …

Njunlp's participation for the wmt2022 quality estimation shared task

X Geng, Y Zhang, S Huang, S Tao… - Proceedings of the …, 2022 - aclanthology.org
This paper presents submissions of the NJUNLP team in WMT 2022Quality Estimation
shared task 1, where the goal is to predict the sentence-level and word-level quality for …