Applying the information bottleneck principle to prosodic representation learning

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Applying the information bottleneck principle to prosodic representation learning

在引用文章中搜索

[PDF] arxiv.org

iEmoTTS: Toward robust cross-speaker emotion transfer and control for speech synthesis based on disentanglement between prosody and timbre

G Zhang, Y Qin, W Zhang, J Wu, M Li… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Cross-speaker emotion transfer is a common approach to generating emotional speech
when speech data with emotion labels from target speakers is not available. This paper …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Mixed-phoneme bert: Improving bert with mixed phoneme and sup-phoneme representations for text to speech

G Zhang, K Song, X Tan, D Tan, Y Yan, Y Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Recently, leveraging BERT pre-training to improve the phoneme encoder in text to speech
(TTS) has drawn increasing attention. However, the works apply pre-training with character …

被引用次数：20 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Integrating Discrete Word-Level Style Variations into Non-Autoregressive Acoustic Models for Speech Synthesis.

Z Liu, NQ Wu, Y Zhang, Z Ling - INTERSPEECH, 2022 - isca-archive.org

This paper presents a method of integrating word-level style variations (WSVs) into non-
autoregressive acoustic models for speech synthesis. WSVs are discrete latent …

被引用次数：4 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Speech Synthesis with Self-Supervisedly Learnt Prosodic Representations

ZC Liu, ZH Ling, YJ Hu, J Pan, YD Wu, JW Wang - isca-archive.org

This paper presents S4LPR, a Speech Synthesis model conditioned on Self-Supervisedly
Learnt Prosodic Representations. Instead of using raw acoustic features, such as F0 and …