COMET-22: Unbabel-IST 2022 submission for the metrics shared task
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2022 Metrics
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …
Shared Task. Our primary submission–dubbed COMET-22–is an ensemble between a …
Bridging the gap: A survey on integrating (human) feedback for natural language generation
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …
large language models on vast internet-scale datasets. Despite these advancements, there …
Results of the WMT21 metrics shared task: Evaluating metrics with expert-based human evaluations on TED and news domain
This paper presents the results of the WMT21 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT21 News Translation …
to score the outputs of the translation systems competing in the WMT21 News Translation …
Efficient methods for natural language processing: A survey
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …
scaling model parameters and training data; however, using only scale to improve …
Results of the WMT20 metrics shared task
This paper presents the results of the WMT20 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT20 News Translation …
to score the outputs of the translation systems competing in the WMT20 News Translation …
Quality-aware decoding for neural machine translation
Despite the progress in machine translation quality estimation and evaluation in the last
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …
years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers …
Understanding and detecting hallucinations in neural machine translation via model introspection
Neural sequence generation models are known to “hallucinate”, by producing outputs that
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …
are unrelated to the source text. These hallucinations are potentially harmful, yet it remains …
Are references really needed? unbabel-IST 2021 submission for the metrics shared task
In this paper, we present the joint contribution of Unbabel and IST to the WMT 2021 Metrics
Shared Task. With this year's focus on Multidimensional Quality Metric (MQM) as the ground …
Shared Task. With this year's focus on Multidimensional Quality Metric (MQM) as the ground …
Learning compact metrics for MT
Recent developments in machine translation and multilingual text generation have led
researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation as …
researchers to adopt trained metrics such as COMET or BLEURT, which treat evaluation as …
Menli: Robust evaluation metrics from natural language inference
Recently proposed BERT-based evaluation metrics for text generation perform well on
standard benchmarks but are vulnerable to adversarial attacks, eg, relating to information …
standard benchmarks but are vulnerable to adversarial attacks, eg, relating to information …