How to train your fillers: uh and um in spontaneous speech synthesis

[PDF][PDF] Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence.

A Kirkland, H Lameris, E Székely, J Gustafson - INTERSPEECH, 2022 - isca-archive.org

Much of the research investigating the perception of speaker certainty has relied on either
attempting to elicit prosodic features in read speech, or artificial manipulation of recorded …

被引用次数：21 相关文章所有 6 个版本

[PDF] arxiv.org

Prosody-controllable spontaneous TTS with neural HMMs

H Lameris, S Mehta, GE Henter… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Spontaneous speech has many affective and pragmatic functions that are interesting and
challenging to model in TTS. However, the presence of reduced articulation, fillers …

被引用次数：18 相关文章所有 5 个版本

[PDF] diva-portal.org

Breathing and speech planning in spontaneous speech synthesis

É Székely, GE Henter, J Beskow… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Breathing and speech planning in spontaneous speech are coordinated processes, often
exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Fillers in spoken language understanding: Computational and psycholinguistic perspectives

T Dinkar, C Clavel, I Vasilescu - arXiv preprint arXiv:2301.10761, 2023 - arxiv.org

Disfluencies (ie interruptions in the regular flow of speech), are ubiquitous to spoken
discourse. Fillers (" uh"," um") are disfluencies that occur the most frequently compared to …

被引用次数：11 相关文章所有 5 个版本

[PDF] nsf.gov

Perception of concatenative vs. neural text-to-speech (TTS): Differences in intelligibility in noise and language attitudes

M Cohn, G Zellou - Proceedings of Interspeech, 2020 - par.nsf.gov

This study tests speech-in-noise perception and social ratings of speech produced by
different text-to-speech (TTS) synthesis methods. We used identical speaker training …

被引用次数：33 相关文章所有 14 个版本

[PDF] researchgate.net

[PDF][PDF] Personality in the mix-investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis

J Gustafson, J Beskow, E Szekely - Proc. 11th ISCA SSW, 2021 - researchgate.net

Studies on human-human interactions have shown that that the fluency of a speaker
influences the perception of personality. Adding fillers and discourse markers can make the …

被引用次数：17 相关文章所有 7 个版本

[PDF] aclanthology.org

Evaluating sampling-based filler insertion with spontaneous tts

S Wang, J Gustafson, É Székely - Proceedings of the Thirteenth …, 2022 - aclanthology.org

Inserting fillers (such as “um”,“like”) to clean speech text has a rich history of study. One
major application is to make dialogue systems sound more spontaneous. The ambiguity of …

被引用次数：8 相关文章所有 5 个版本

[PDF] ed.ac.uk

Synthesis after a couple PINTs: Investigating the role of pause-internal phonetic particles in speech synthesis and perception

M Elmers, J O'Mahony, E Székely - Interspeech 2023, 2023 - research.ed.ac.uk

Pause-internal phonetic particles (PINTs), such as breath noises, tongue clicks and
hesitations, play an important role in speech perception but are rarely modeled in speech …

被引用次数：6 相关文章所有 11 个版本

[PDF] arxiv.org

Transplantation of conversational speaking style with interjections in sequence-to-sequence speech synthesis

R Fernandez, D Haws, G Lorberbom… - arXiv preprint arXiv …, 2022 - arxiv.org

Sequence-to-Sequence Text-to-Speech architectures that directly generate low level
acoustic features from phonetic sequences are known to produce natural and expressive …

被引用次数：9 相关文章所有 7 个版本

[PDF] arxiv.org

On the use of self-supervised speech representations in spontaneous speech synthesis

S Wang, GE Henter, J Gustafson, E Székely - arXiv preprint arXiv …, 2023 - arxiv.org

Self-supervised learning (SSL) speech representations learned from large amounts of
diverse, mixed-quality speech data without transcriptions are gaining ground in many …

被引用次数：3 相关文章所有 6 个版本