A survey of evaluation metrics used for NLG systems
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …
evaluating Natural Language Generation (NLG) systems. The rapid development and …
A comprehensive survey on various fully automatic machine translation evaluation metrics
The fast advancement in machine translation models necessitates the development of
accurate evaluation metrics that would allow researchers to track the progress in text …
accurate evaluation metrics that would allow researchers to track the progress in text …
Semeval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation
Semantic Textual Similarity (STS) seeks to measure the degree of semantic equivalence
between two snippets of text. Similarity is expressed on an ordinal scale that spans from …
between two snippets of text. Similarity is expressed on an ordinal scale that spans from …
Semantic structural evaluation for text simplification
Current measures for evaluating text simplification systems focus on evaluating lexical text
aspects, neglecting its structural aspects. In this paper we propose the first measure to …
aspects, neglecting its structural aspects. In this paper we propose the first measure to …
On the limitations of cross-lingual encoders as exposed by reference-free machine translation evaluation
Evaluation of cross-lingual encoders is usually performed either via zero-shot cross-lingual
transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity …
transfer in supervised downstream tasks or via unsupervised cross-lingual textual similarity …
[PDF][PDF] MEANT 2.0: Accurate semantic MT evaluation for any output language
C Lo - Proceedings of the second conference on machine …, 2017 - aclanthology.org
We describe a new version of MEANT, which participated in the metrics task of the Second
Conference on Machine Translation (WMT 2017). MEANT 2.0 uses idfweighted …
Conference on Machine Translation (WMT 2017). MEANT 2.0 uses idfweighted …
Adequacy–fluency metrics: Evaluating mt in the continuous space model framework
This work extends and evaluates a two-dimensional automatic evaluation metric for machine
translation, which is designed to operate at the sentence level. The metric is based on the …
translation, which is designed to operate at the sentence level. The metric is based on the …
SemSyn: Semantic-Syntactic Similarity Based Automatic Machine Translation Evaluation Metric
Machine translation evaluation is difficult and challenging for natural languages because
different languages behave differently for the same dataset. Lexical-based metrics have …
different languages behave differently for the same dataset. Lexical-based metrics have …
On the evaluation of semantic phenomena in neural machine translation using natural language inference
We propose a process for investigating the extent to which sentence representations arising
from neural machine translation (NMT) systems encode distinct semantic phenomena. We …
from neural machine translation (NMT) systems encode distinct semantic phenomena. We …
Reference-less measure of faithfulness for grammatical error correction
We propose USim, a semantic measure for Grammatical Error Correction (GEC) that
measures the semantic faithfulness of the output to the source, thereby complementing …
measures the semantic faithfulness of the output to the source, thereby complementing …