Chatgpt to replace crowdsourcing of paraphrases for intent classification: Higher diversity and comparable model robustness

J Cegin, J Simko, P Brusilovsky - arXiv preprint arXiv:2305.12947, 2023 - arxiv.org
The emergence of generative large language models (LLMs) raises the question: what will
be its impact on crowdsourcing? Traditionally, crowdsourcing has been used for acquiring …

Annotation error detection: Analyzing the past and present for a more coherent future

JC Klie, B Webber, I Gurevych - Computational Linguistics, 2023 - direct.mit.edu
Annotated data is an essential ingredient in natural language processing for training and
evaluating machine learning models. It is therefore very desirable for the annotations to be …

User utterance acquisition for training task-oriented bots: a review of challenges, techniques and opportunities

MA Yaghoub-Zadeh-Fard, B Benatallah… - IEEE Internet …, 2020 - ieeexplore.ieee.org
Building conversational task-oriented bots requires large and diverse sets of annotated user
utterances to learn mappings between natural language utterances and user intents. Given …

End-to-end learning of flowchart grounded task-oriented dialogs

D Raghu, S Agarwal, S Joshi - arXiv preprint arXiv:2109.07263, 2021 - arxiv.org
We propose a novel problem within end-to-end learning of task-oriented dialogs (TOD), in
which the dialog system mimics a troubleshooting agent who helps a user by diagnosing …

Leveraging user paraphrasing behavior in dialog systems to automatically collect annotations for long-tail utterances

T Falke, M Boese, D Sorokin, C Tirkaz… - Proceedings of the 28th …, 2020 - aclanthology.org
In large-scale commercial dialog systems, users express the same request in a wide variety
of alternative ways with a long tail of less frequent alternatives. Handling the full range of this …

Inconsistencies in crowdsourced slot-filling annotations: A typology and identification methods

S Larson, A Cheung, A Mahendran… - Proceedings of the …, 2020 - aclanthology.org
Slot-filling models in task-driven dialog systems rely on carefully annotated training data.
However, annotations by crowd workers are often inconsistent or contain errors. Simple …

A natural language processing approach to detect inconsistencies in death investigation notes attributing suicide circumstances

S Wang, Y Zhou, Z Han, C Tao, Y Xiao, Y Ding… - Communications …, 2024 - nature.com
Background Data accuracy is essential for scientific research and policy development. The
National Violent Death Reporting System (NVDRS) data is widely used for discovering the …

ActiveAED: A human in the loop improves annotation error detection

L Weber, B Plank - arXiv preprint arXiv:2305.20045, 2023 - arxiv.org
Manually annotated datasets are crucial for training and evaluating Natural Language
Processing models. However, recent work has discovered that even widely-used benchmark …

Dynamic word recommendation to obtain diverse crowdsourced paraphrases of user utterances

MA Yaghoub-Zadeh-Fard, B Benatallah… - Proceedings of the 25th …, 2020 - dl.acm.org
Building task-oriented bots requires mapping a user utterance to an intent with its associated
entities to serve the request. Doing so is not easy since it requires large quantities of high …

Uncovering Misattributed Suicide Causes through Annotation Inconsistency Detection in Death Investigation Notes

S Wang, Y Zhou, Z Han, C Tao, Y Xiao, Y Ding… - arXiv preprint arXiv …, 2024 - arxiv.org
Data accuracy is essential for scientific research and policy development. The National
Violent Death Reporting System (NVDRS) data is widely used for discovering the patterns …