Exploiting morphological and phonological features to improve prosodic phrasing for mongolian speech synthesis

R Liu, B Sisman, F Bao, J Yang… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-
speech synthesis. Studies show that deep learning techniques improve prosodic phrasing …

Modeling prosodic phrasing with multi-task learning in tacotron-based TTS

R Liu, B Sisman, F Bao, G Gao… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
Tacotron-based end-to-end speech synthesis has shown remarkable voice quality.
However, the rendering of prosody in the synthesized speech remains to be improved …

Deep learning for prominence detection in children's read speech

M Vaidya, K Sabu, P Rao - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
The detection of perceived prominence in speech has attracted approaches ranging from
the design of knowledge-based linguistic and acoustic features to the automatic feature …

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Z Dai, J Yu, Y Wang, N Chen, Y Bian, G Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of
naturalness and readability. However, the acquisition of prosodic boundary labels relies on …

[PDF][PDF] Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of Speech-Silence and Word-Punctuation

J Zhong, Y Li, H Huang, K Richmond, J Liu… - Proc. Interspeech …, 2024 - isca-archive.org
In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly
improve the naturalness and controllability of synthesised speech. However, manual …

Pitchtron: Towards audiobook generation from ordinary people's voices

S Jung, H Kim - arXiv preprint arXiv:2005.10456, 2020 - arxiv.org
In this paper, we explore prosody transfer for audiobook generation under rather realistic
condition where training DB is plain audio mostly from multiple ordinary people and …

[PDF][PDF] Incorporating prosodic events in text-to-speech synthesis

R Sloan, A Adigwe, S Mohandoss… - Proceedings of the …, 2022 - isca-archive.org
While producing accurate prosody can significantly improve the naturalness and
comprehensibility of synthesized speech, many Text-to-Speech (TTS) systems still do not …

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

J Zhong, Y Li, H Huang, K Richmond, J Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly
improve the naturalness and controllability of synthesised speech. However, manual …

Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare

M Kim, Y Jung, HC Kwon - Electronics, 2021 - mdpi.com
Speech processing technology has great potential in the medical field to provide beneficial
solutions for both patients and doctors. Speech interfaces, represented by speech synthesis …

The effect of different information sources on prosodic boundary perception

B Ludusan, M Morii, Y Minagawa, E Dupoux - JASA Express Letters, 2021 - pubs.aip.org
This study aims to quantify the effect of several information sources: acoustic, higher-level
linguistic, and knowledge of the prosodic system of the language, on the perception of …