HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS

D Guo, X Zhu, L Xue, T Li, Y Lv… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Recent advances in text-to-speech, particularly those based on Graph Neural Networks
(GNNs), have significantly improved the expressiveness of short-form synthetic speech …

RWEN-TTS: relation-aware word encoding network for natural text-to-speech synthesis

S Oh, HR Noh, Y Hong, I Oh - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
With the advent of deep learning, a huge number of text-to-speech (TTS) models which
produce human-like speech have emerged. Recently, by introducing syntactic and semantic …

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

J Zhong, Y Li, H Huang, J Liu, Z Su, J Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly
advance the naturalness and controllability of synthesized speech. While human prosody …

[引用][C] Content-aware text-to-speech with prompt-based prosody control

T Bott - 2023