Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
Survey of the state of the art in natural language generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …
Choosing words in computer-generated weather forecasts
One of the main challenges in automatically generating textual weather forecasts is
choosing appropriate English words to communicate numeric weather data. A corpus-based …
choosing appropriate English words to communicate numeric weather data. A corpus-based …
[图书][B] Not exactly: In praise of vagueness
K Van Deemter - 2010 - books.google.com
Not everything is black and white. Our daily lives are full of vagueness or fuzziness.
Language is the most obvious example-for instance, when we describe someone as tall, it is …
Language is the most obvious example-for instance, when we describe someone as tall, it is …
[PDF][PDF] Comparing automatic and human evaluation of NLG systems
We consider the evaluation problem in Natural Language Generation (NLG) and present
results for evaluating several NLG systems with similar functionality, including a knowledge …
results for evaluating several NLG systems with similar functionality, including a knowledge …
An investigation into the validity of some metrics for automatically evaluating natural language generation systems
There is growing interest in using automatically computed corpus-based evaluation metrics
to evaluate Natural Language Generation (NLG) systems, because these are often …
to evaluate Natural Language Generation (NLG) systems, because these are often …
Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models
A Belz - Natural Language Engineering, 2008 - cambridge.org
Two important recent trends in natural language generation are (i) probabilistic techniques
and (ii) comprehensive approaches that move away from traditional strictly modular and …
and (ii) comprehensive approaches that move away from traditional strictly modular and …
Individual and domain adaptation in sentence planning for dialogue
One of the biggest challenges in the development and deployment of spoken dialogue
systems is the design of the spoken language generation module. This challenge arises …
systems is the design of the spoken language generation module. This challenge arises …
Evaluating factual accuracy in complex data-to-text
It is essential that data-to-text Natural Language Generation (NLG) systems produce texts
which are factually accurate. We examine accuracy issues in the task of generating …
which are factually accurate. We examine accuracy issues in the task of generating …
Rethinking the agreement in human evaluation tasks
Human evaluations are broadly thought to be more valuable the higher the inter-annotator
agreement. In this paper we examine this idea. We will describe our experiments and …
agreement. In this paper we examine this idea. We will describe our experiments and …