Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018 - jair.org
This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …

Choosing words in computer-generated weather forecasts

E Reiter, S Sripada, J Hunter, J Yu, I Davy - Artificial Intelligence, 2005 - Elsevier
One of the main challenges in automatically generating textual weather forecasts is
choosing appropriate English words to communicate numeric weather data. A corpus-based …

[图书][B] Not exactly: In praise of vagueness

K Van Deemter - 2010 - books.google.com
Not everything is black and white. Our daily lives are full of vagueness or fuzziness.
Language is the most obvious example-for instance, when we describe someone as tall, it is …

[PDF][PDF] Comparing automatic and human evaluation of NLG systems

A Belz, E Reiter - 11th conference of the european chapter of the …, 2006 - aclanthology.org
We consider the evaluation problem in Natural Language Generation (NLG) and present
results for evaluating several NLG systems with similar functionality, including a knowledge …

An investigation into the validity of some metrics for automatically evaluating natural language generation systems

E Reiter, A Belz - Computational Linguistics, 2009 - direct.mit.edu
There is growing interest in using automatically computed corpus-based evaluation metrics
to evaluate Natural Language Generation (NLG) systems, because these are often …

Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models

A Belz - Natural Language Engineering, 2008 - cambridge.org
Two important recent trends in natural language generation are (i) probabilistic techniques
and (ii) comprehensive approaches that move away from traditional strictly modular and …

Individual and domain adaptation in sentence planning for dialogue

MA Walker, A Stent, F Mairesse, R Prasad - Journal of Artificial Intelligence …, 2007 - jair.org
One of the biggest challenges in the development and deployment of spoken dialogue
systems is the design of the spoken language generation module. This challenge arises …

Evaluating factual accuracy in complex data-to-text

C Thomson, E Reiter, B Sundararajan - Computer Speech & Language, 2023 - Elsevier
It is essential that data-to-text Natural Language Generation (NLG) systems produce texts
which are factually accurate. We examine accuracy issues in the task of generating …

Rethinking the agreement in human evaluation tasks

J Amidei, P Piwek, A Willis - Proceedings of the 27th International …, 2018 - aclanthology.org
Human evaluations are broadly thought to be more valuable the higher the inter-annotator
agreement. In this paper we examine this idea. We will describe our experiments and …