Robustness testing of language understanding in task-oriented dialog

J Liu, R Takanobu, J Wen, D Wan, H Li, W Nie… - arXiv preprint arXiv …, 2020 - arxiv.org
Most language understanding models in task-oriented dialog systems are trained on a small
amount of annotated training data, and evaluated in a small set from the same distribution …

Conversational end-to-end tts for voice agents

H Guo, S Zhang, FK Soong, L He… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
End-to-end neural TTS has achieved excellent performance on reading style speech
synthesis. However, it is still a challenge to build a high-quality conversational TTS due to …

[PDF][PDF] How to train your fillers: uh and um in spontaneous speech synthesis

É Székely, GE Henter, J Beskow… - The 10th ISCA Speech …, 2019 - diva-portal.org
Using spontaneous conversational speech for TTS raises questions on how disfluencies
such as filled pauses (FPs) should be approached. Detailed annotation of FPs in training …

Evaluating sampling-based filler insertion with spontaneous tts

S Wang, J Gustafson, É Székely - Proceedings of the Thirteenth …, 2022 - aclanthology.org
Inserting fillers (such as “um”,“like”) to clean speech text has a rich history of study. One
major application is to make dialogue systems sound more spontaneous. The ambiguity of …

SponTTS: modeling and transferring spontaneous style for TTS

H Li, X Zhu, L Xue, Y Song, Y Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Spontaneous speaking style exhibits notable differences from other speaking styles due to
various spontaneous phenomena (eg, filled pauses, prolongation) and substantial prosody …

Personalized filled-pause generation with group-wise prediction models

Y Matsunaga, T Saeki, S Takamichi… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we propose a method to generate personalized filled pauses (FPs) with group-
wise prediction models. Compared with fluent text generation, disfluent text generation has …

[PDF][PDF] Ethical Considerations in the Use of Disfluencies in AI-Generated Speech.

RL Rose - CSEDU (2), 2023 - scitepress.org
Disfluency occurs regularly in natural, everyday speech. Such phenomena as silent pauses,
filled pauses (uh, um in English) and repairs occur with regular frequency and training data …

[PDF][PDF] Traitement automatique du style dans le langage naturel: quelques contributions et perspectives

G Lecorvé - 2020 - hal.science
The study of natural language is present in multiple scientific disciplines. In linguistics, this
study is purpose itself and comes in many specific areas. In neuro-sciences also, language …

[PDF][PDF] Detection of Fillers in Conversational Speech

E Calo, T Rosemplatt, I Sheikh, C Cerisara - idmc.univ-lorraine.fr
Large vocabulary Automatic Speech Recognition (ASR) systems require a huge amount of
human transcribed speech data. Such speech corpora are either purchased from dataset …

Systems and methods for unsupervised paraphrase mining

B Golshan, C Chen, WC Tan, MA Danni - US Patent 11,741,312, 2023 - Google Patents
Disclosed embodiments relate to aligning pairs of sentences. Techniques can include
receiving a plurality of sentences; generating a graph for each of at least two sentences of …