HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS
Recent advances in text-to-speech, particularly those based on Graph Neural Networks
(GNNs), have significantly improved the expressiveness of short-form synthetic speech …
(GNNs), have significantly improved the expressiveness of short-form synthetic speech …
RWEN-TTS: relation-aware word encoding network for natural text-to-speech synthesis
S Oh, HR Noh, Y Hong, I Oh - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
With the advent of deep learning, a huge number of text-to-speech (TTS) models which
produce human-like speech have emerged. Recently, by introducing syntactic and semantic …
produce human-like speech have emerged. Recently, by introducing syntactic and semantic …
Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP
J Zhong, Y Li, H Huang, J Liu, Z Su, J Guo… - arXiv preprint arXiv …, 2023 - arxiv.org
In the realm of expressive Text-to-Speech (TTS), explicit prosodic boundaries significantly
advance the naturalness and controllability of synthesized speech. While human prosody …
advance the naturalness and controllability of synthesized speech. While human prosody …