[PDF][PDF] Where's the uh, hesitation? The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence.

A Kirkland, H Lameris, E Székely, J Gustafson - INTERSPEECH, 2022 - isca-archive.org
Much of the research investigating the perception of speaker certainty has relied on either
attempting to elicit prosodic features in read speech, or artificial manipulation of recorded …

Prosody-controllable spontaneous TTS with neural HMMs

H Lameris, S Mehta, GE Henter… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Spontaneous speech has many affective and pragmatic functions that are interesting and
challenging to model in TTS. However, the presence of reduced articulation, fillers …

Breathing and speech planning in spontaneous speech synthesis

É Székely, GE Henter, J Beskow… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Breathing and speech planning in spontaneous speech are coordinated processes, often
exhibiting disfluent patterns. While synthetic speech is not subject to respiratory needs …

Fillers in spoken language understanding: Computational and psycholinguistic perspectives

T Dinkar, C Clavel, I Vasilescu - arXiv preprint arXiv:2301.10761, 2023 - arxiv.org
Disfluencies (ie interruptions in the regular flow of speech), are ubiquitous to spoken
discourse. Fillers (" uh"," um") are disfluencies that occur the most frequently compared to …

Perception of concatenative vs. neural text-to-speech (TTS): Differences in intelligibility in noise and language attitudes

M Cohn, G Zellou - Proceedings of Interspeech, 2020 - par.nsf.gov
This study tests speech-in-noise perception and social ratings of speech produced by
different text-to-speech (TTS) synthesis methods. We used identical speaker training …

[PDF][PDF] Personality in the mix-investigating the contribution of fillers and speaking style to the perception of spontaneous speech synthesis

J Gustafson, J Beskow, E Szekely - Proc. 11th ISCA SSW, 2021 - researchgate.net
Studies on human-human interactions have shown that that the fluency of a speaker
influences the perception of personality. Adding fillers and discourse markers can make the …

Evaluating sampling-based filler insertion with spontaneous tts

S Wang, J Gustafson, É Székely - Proceedings of the Thirteenth …, 2022 - aclanthology.org
Inserting fillers (such as “um”,“like”) to clean speech text has a rich history of study. One
major application is to make dialogue systems sound more spontaneous. The ambiguity of …

Synthesis after a couple PINTs: Investigating the role of pause-internal phonetic particles in speech synthesis and perception

M Elmers, J O'Mahony, E Székely - Interspeech 2023, 2023 - research.ed.ac.uk
Pause-internal phonetic particles (PINTs), such as breath noises, tongue clicks and
hesitations, play an important role in speech perception but are rarely modeled in speech …

Transplantation of conversational speaking style with interjections in sequence-to-sequence speech synthesis

R Fernandez, D Haws, G Lorberbom… - arXiv preprint arXiv …, 2022 - arxiv.org
Sequence-to-Sequence Text-to-Speech architectures that directly generate low level
acoustic features from phonetic sequences are known to produce natural and expressive …

On the use of self-supervised speech representations in spontaneous speech synthesis

S Wang, GE Henter, J Gustafson, E Székely - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) speech representations learned from large amounts of
diverse, mixed-quality speech data without transcriptions are gaining ground in many …