Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

[HTML][HTML] Survey on reinforcement learning for language processing

V Uc-Cetina, N Navarro-Guerrero… - Artificial Intelligence …, 2023 - Springer
In recent years some researchers have explored the use of reinforcement learning (RL)
algorithms as key components in the solution of various natural language processing (NLP) …

Overview of the dialogue breakdown detection challenge 4

R Higashinaka, LF D'Haro, B Abu Shawar… - … and Flexibility in …, 2021 - Springer
To promote the research and development of dialogue breakdown detection for dialogue
systems, we have been organizing a series of dialogue breakdown detection challenges to …

Underreporting of errors in NLG output, and what to do about it

E Van Miltenburg, MA Clinciu, O Dušek… - arXiv preprint arXiv …, 2021 - arxiv.org
We observe a severe under-reporting of the different kinds of errors that Natural Language
Generation systems make. This is a problem, because mistakes are an important indicator of …

Overview of the sixth dialog system technology challenge: DSTC6

C Hori, J Perez, R Higashinaka, T Hori… - Computer Speech & …, 2019 - Elsevier
This paper describes the experimental setups and the evaluation results of the sixth Dialog
System Technology Challenges (DSTC6) aiming to develop end-to-end dialogue systems …

[PDF][PDF] Overview of the NTCIR-12 Short Text Conversation Task.

L Shang, T Sakai, Z Lu, H Li, R Higashinaka, Y Miyao - NTCIR, 2016 - research.nii.ac.jp
We give an overview of the NII Testbeds and Community for Information access Research
(NTCIR)-13 Short Text Conversation (STC) task, which was a core task of NTCIR-13. At …

Integrated taxonomy of errors in chat-oriented dialogue systems

R Higashinaka, M Araki, H Tsukahara… - Proceedings of the …, 2021 - aclanthology.org
This paper proposes a taxonomy of errors in chat-oriented dialogue systems. Previously, two
taxonomies were proposed; one is theory-driven and the other data-driven. The former …

A taxonomy, data set, and benchmark for detecting and classifying malevolent dialogue responses

Y Zhang, P Ren, M De Rijke - Journal of the Association for …, 2021 - Wiley Online Library
Conversational interfaces are increasingly popular as a way of connecting people to
information. With the increased generative capacity of corpus‐based conversational agents …

[PDF][PDF] Annotating errors and emotions in human-chatbot interactions in Italian

M Sanguinetti, A Mazzei, V Patti, M Scalerandi… - The 14th Linguistic …, 2020 - iris.unito.it
This paper describes a novel annotation scheme specifically designed for a customer-
service context where written interactions take place between a given user and the chatbot …

A role-selected sharing network for joint machine-human chatting handoff and service satisfaction analysis

J Liu, K Song, Y Kang, G He, Z Jiang, C Sun… - arXiv preprint arXiv …, 2021 - arxiv.org
Chatbot is increasingly thriving in different domains, however, because of unexpected
discourse complexity and training data sparseness, its potential distrust hatches vital …