Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
[HTML][HTML] Survey on reinforcement learning for language processing
V Uc-Cetina, N Navarro-Guerrero… - Artificial Intelligence …, 2023 - Springer
In recent years some researchers have explored the use of reinforcement learning (RL)
algorithms as key components in the solution of various natural language processing (NLP) …
algorithms as key components in the solution of various natural language processing (NLP) …
Overview of the dialogue breakdown detection challenge 4
To promote the research and development of dialogue breakdown detection for dialogue
systems, we have been organizing a series of dialogue breakdown detection challenges to …
systems, we have been organizing a series of dialogue breakdown detection challenges to …
Underreporting of errors in NLG output, and what to do about it
We observe a severe under-reporting of the different kinds of errors that Natural Language
Generation systems make. This is a problem, because mistakes are an important indicator of …
Generation systems make. This is a problem, because mistakes are an important indicator of …
Overview of the sixth dialog system technology challenge: DSTC6
This paper describes the experimental setups and the evaluation results of the sixth Dialog
System Technology Challenges (DSTC6) aiming to develop end-to-end dialogue systems …
System Technology Challenges (DSTC6) aiming to develop end-to-end dialogue systems …
[PDF][PDF] Overview of the NTCIR-12 Short Text Conversation Task.
We give an overview of the NII Testbeds and Community for Information access Research
(NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At …
(NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At …
Integrated taxonomy of errors in chat-oriented dialogue systems
R Higashinaka, M Araki, H Tsukahara… - Proceedings of the …, 2021 - aclanthology.org
This paper proposes a taxonomy of errors in chat-oriented dialogue systems. Previously, two
taxonomies were proposed; one is theory-driven and the other data-driven. The former …
taxonomies were proposed; one is theory-driven and the other data-driven. The former …
A taxonomy, data set, and benchmark for detecting and classifying malevolent dialogue responses
Conversational interfaces are increasingly popular as a way of connecting people to
information. With the increased generative capacity of corpus‐based conversational agents …
information. With the increased generative capacity of corpus‐based conversational agents …
[PDF][PDF] Annotating errors and emotions in human-chatbot interactions in Italian
This paper describes a novel annotation scheme specifically designed for a customer-
service context where written interactions take place between a given user and the chatbot …
service context where written interactions take place between a given user and the chatbot …
A role-selected sharing network for joint machine-human chatting handoff and service satisfaction analysis
Chatbot is increasingly thriving in different domains, however, because of unexpected
discourse complexity and training data sparseness, its potential distrust hatches vital …
discourse complexity and training data sparseness, its potential distrust hatches vital …