Evaluation of real-time deep learning turn-taking models for multiple dialogue scenarios

D Lala, K Inoue, T Kawahara - Proceedings of the 20th ACM International …, 2018 - dl.acm.org
The task of identifying when to take a conversational turn is an important function of spoken
dialogue systems. The turn-taking system should also ideally be able to handle many types …

Gated multimodal fusion with contrastive learning for turn-taking prediction in human-robot dialogue

J Yang, P Wang, Y Zhu, M Feng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Turn-taking, aiming to decide when the next speaker can start talking, is an essential
component in building human-robot spoken dialogue systems. Previous studies indicate that …

[PDF][PDF] Turn-Taking Prediction Based on Detection of Transition Relevance Place.

K Hara, K Inoue, K Takanashi, T Kawahara - INTERSPEECH, 2019 - sap.ist.i.kyoto-u.ac.jp
We address turn-taking prediction in which spoken dialogue systems predict when to take
the conversational floor. In natural conversations, many turn-taking decisions are arbitrary …

[PDF][PDF] Investigating Linguistic and Semantic Features for Turn-Taking Prediction in Open-Domain Human-Computer Conversation.

SZ Razavi, B Kane, LK Schubert - INTERSPEECH, 2019 - cs.rochester.edu
In this paper we address the problem of turn-taking prediction in open-ended
communication between humans and dialogue agents. In a non-task-oriented interaction …

[HTML][HTML] Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process

C Montenegro, R Santana, JA Lozano - Engineering Applications of …, 2021 - Elsevier
Abstract An End-Of-Turn Detection Module (EOTD-M) is an essential component of
automatic Spoken Dialogue Systems. The capability of correctly detecting whether a user's …

Who speaks next? Turn change and next speaker prediction in multimodal multiparty interaction

U Malik, J Saunier, K Funakoshi… - 2020 IEEE 32nd …, 2020 - ieeexplore.ieee.org
Turn change prediction and next speaker prediction are two important tasks in multimodal,
multiparty human-agent interaction. Predicting a change of dialogue turn and the most …

What BERT Based Language Models Learn in Spoken Transcripts: An Empirical Study

A Kumar, MN Sundararaman, J Vepa - arXiv preprint arXiv:2109.09105, 2021 - arxiv.org
Language Models (LMs) have been ubiquitously leveraged in various tasks including
spoken language understanding (SLU). Spoken language requires careful understanding of …

[PDF][PDF] Timing Generating Networks: Neural Network Based Precise Turn-Taking Timing Prediction in Multiparty Conversation.

S Fujie, H Katayama, J Sakuma, T Kobayashi - Interspeech, 2021 - isca-archive.org
A brand new neural network based precise timing generation framework, named the Timing
Generating Network (TGN), is proposed and applied to turn-taking timing decision problems …

Automatic offline annotation of turn-taking transitions in task-oriented dialogue

P Brusco, A Gravano - Computer Speech & Language, 2023 - Elsevier
As the volume of recorded conversations continues to surge, so does the need for their
automatic processing. Plenty of information beyond words may be extracted from the speech …

Modeling Turn-Taking in Human-To-Human Spoken Dialogue Datasets Using Self-Supervised Features

E Morais, M Damasceno, H Aronowitz… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Self-supervised pre-trained models have consistently delivered state-of-art results in the
fields of natural language and speech processing. However, we argue that their merits for …