Prosody prediction from syntactic, lexical, and word embedding features

Exploiting morphological and phonological features to improve prosodic phrasing for mongolian speech synthesis

R Liu, B Sisman, F Bao, J Yang… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Prosodic phrasing is an important factor that affects naturalness and intelligibility in text-to-
speech synthesis. Studies show that deep learning techniques improve prosodic phrasing …

被引用次数：30 相关文章所有 5 个版本

[PDF] ieee.org

Modeling prosodic phrasing with multi-task learning in tacotron-based TTS

R Liu, B Sisman, F Bao, G Gao… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org

Tacotron-based end-to-end speech synthesis has shown remarkable voice quality.
However, the rendering of prosody in the synthesized speech remains to be improved …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Deep learning for prominence detection in children's read speech

M Vaidya, K Sabu, P Rao - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

The detection of perceived prominence in speech has attracted approaches ranging from
the design of knowledge-based linguistic and acoustic features to the automatic feature …

被引用次数：7 相关文章所有 9 个版本

[PDF] arxiv.org

Automatic Prosody Annotation with Pre-Trained Text-Speech Model

Z Dai, J Yu, Y Wang, N Chen, Y Bian, G Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Prosodic boundary plays an important role in text-to-speech synthesis (TTS) in terms of
naturalness and readability. However, the acquisition of prosodic boundary labels relies on …

被引用次数：4 相关文章所有 8 个版本

[PDF] isca-archive.org

[PDF][PDF] Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of Speech-Silence and Word-Punctuation

J Zhong, Y Li, H Huang, K Richmond, J Liu… - Proc. Interspeech …, 2024 - isca-archive.org

In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly
improve the naturalness and controllability of synthesised speech. However, manual …

Pitchtron: Towards audiobook generation from ordinary people's voices

S Jung, H Kim - arXiv preprint arXiv:2005.10456, 2020 - arxiv.org

In this paper, we explore prosody transfer for audiobook generation under rather realistic
condition where training DB is plain audio mostly from multiple ordinary people and …

被引用次数：8 相关文章所有 2 个版本

[PDF] isca-archive.org

[PDF][PDF] Incorporating prosodic events in text-to-speech synthesis

R Sloan, A Adigwe, S Mohandoss… - Proceedings of the …, 2022 - isca-archive.org

While producing accurate prosody can significantly improve the naturalness and
comprehensibility of synthesized speech, many Text-to-Speech (TTS) systems still do not …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

J Zhong, Y Li, H Huang, K Richmond, J Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

In expressive and controllable Text-to-Speech (TTS), explicit prosodic features significantly
improve the naturalness and controllability of synthesised speech. However, manual …

Korean Prosody Phrase Boundary Prediction Model for Speech Synthesis Service in Smart Healthcare

M Kim, Y Jung, HC Kwon - Electronics, 2021 - mdpi.com

Speech processing technology has great potential in the medical field to provide beneficial
solutions for both patients and doctors. Speech interfaces, represented by speech synthesis …

被引用次数：1 相关文章所有 3 个版本

[PDF] aip.org

The effect of different information sources on prosodic boundary perception

B Ludusan, M Morii, Y Minagawa, E Dupoux - JASA Express Letters, 2021 - pubs.aip.org

This study aims to quantify the effect of several information sources: acoustic, higher-level
linguistic, and knowledge of the prosodic system of the language, on the perception of …

被引用次数：3 相关文章所有 10 个版本