Disfluency insertion for spontaneous tts: Formalization and proof of concept

J Liu, R Takanobu, J Wen, D Wan, H Li, W Nie… - arXiv preprint arXiv …, 2020 - arxiv.org

Most language understanding models in task-oriented dialog systems are trained on a small
amount of annotated training data, and evaluated in a small set from the same distribution …

被引用次数：52 相关文章所有 7 个版本

[PDF] arxiv.org

Conversational end-to-end tts for voice agents

H Guo, S Zhang, FK Soong, L He… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

End-to-end neural TTS has achieved excellent performance on reading style speech
synthesis. However, it is still a challenge to build a high-quality conversational TTS due to …

被引用次数：78 相关文章所有 4 个版本

[PDF] diva-portal.org

[PDF][PDF] How to train your fillers: uh and um in spontaneous speech synthesis

É Székely, GE Henter, J Beskow… - The 10th ISCA Speech …, 2019 - diva-portal.org

Using spontaneous conversational speech for TTS raises questions on how disfluencies
such as filled pauses (FPs) should be approached. Detailed annotation of FPs in training …

被引用次数：39 相关文章所有 9 个版本

[PDF] aclanthology.org

Evaluating sampling-based filler insertion with spontaneous tts

S Wang, J Gustafson, É Székely - Proceedings of the Thirteenth …, 2022 - aclanthology.org

Inserting fillers (such as “um”,“like”) to clean speech text has a rich history of study. One
major application is to make dialogue systems sound more spontaneous. The ambiguity of …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

SponTTS: modeling and transferring spontaneous style for TTS

H Li, X Zhu, L Xue, Y Song, Y Chen… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Spontaneous speaking style exhibits notable differences from other speaking styles due to
various spontaneous phenomena (eg, filled pauses, prolongation) and substantial prosody …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Personalized filled-pause generation with group-wise prediction models

Y Matsunaga, T Saeki, S Takamichi… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we propose a method to generate personalized filled pauses (FPs) with group-
wise prediction models. Compared with fluent text generation, disfluent text generation has …

被引用次数：3 相关文章所有 6 个版本

[PDF] scitepress.org

[PDF][PDF] Ethical Considerations in the Use of Disfluencies in AI-Generated Speech.

RL Rose - CSEDU (2), 2023 - scitepress.org

Disfluency occurs regularly in natural, everyday speech. Such phenomena as silent pauses,
filled pauses (uh, um in English) and repairs occur with regular frequency and training data …

[PDF][PDF] Traitement automatique du style dans le langage naturel: quelques contributions et perspectives

G Lecorvé - 2020 - hal.science

The study of natural language is present in multiple scientific disciplines. In linguistics, this
study is purpose itself and comes in many specific areas. In neuro-sciences also, language …

被引用次数：1 相关文章所有 5 个版本

[PDF] univ-lorraine.fr

[PDF][PDF] Detection of Fillers in Conversational Speech

E Calo, T Rosemplatt, I Sheikh, C Cerisara - idmc.univ-lorraine.fr

Large vocabulary Automatic Speech Recognition (ASR) systems require a huge amount of
human transcribed speech data. Such speech corpora are either purchased from dataset …

Systems and methods for unsupervised paraphrase mining

B Golshan, C Chen, WC Tan, MA Danni - US Patent 11,741,312, 2023 - Google Patents

Disclosed embodiments relate to aligning pairs of sentences. Techniques can include
receiving a plurality of sentences; generating a graph for each of at least two sentences of …