Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models...

MSStyleTTS: Multi-scale style modeling with hierarchical context information for expressive speech synthesis

S Lei, Y Zhou, L Chen, Z Wu, X Wu… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org

Expressive speech synthesis is crucial for many human-computer interaction scenarios,
such as audiobooks, podcasts, and voice assistants. Previous works focus on predicting the …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Context-aware coherent speaking style prediction with hierarchical transformers for audiobook speech synthesis

S Lei, Y Zhou, L Chen, Z Wu, S Kang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Recent advances in text-to-speech have significantly improved the expressiveness of
synthesized speech. However, it is still challenging to generate speech with contextually …

被引用次数：5 相关文章所有 4 个版本

PE-Wav2vec: A Prosody-Enhanced Speech Model for Self-Supervised Prosody Learning in TTS

ZC Liu, L Chen, YJ Hu, ZH Ling… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

This paper investigates leveraging large-scale untranscribed speech data to enhance the
prosody modelling capability of text-to-speech (TTS) models. On the basis of the self …

[PDF][PDF] Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations

ZC Liu, ZH Ling, YJ Hu, J Pan, YD Wu, JW Wang - isca-archive.org

This paper presents S4LPR, a Speech Synthesis model conditioned on Self-Supervisedly
Learnt Prosodic Representations. Instead of using raw acoustic features, such as F0 and …

被引用次数：2 相关文章所有 2 个版本

[PDF] nancygao.com

[引用][C] 面向虚实融合的人机交互

陶建华，龚江涛，高楠，傅四维，梁山，喻纯 - 中国图象图形学报, 2023