Learning latent representations for style control and transfer in end-to-end speech synthesis

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer

Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

被引用次数：43 相关文章所有 3 个版本

[PDF] mdpi.com

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com

Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

被引用次数：206 相关文章所有 6 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：418 相关文章所有 2 个版本

[PDF] arxiv.org

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

G Sun, Y Zhang, RJ Weiss, Y Cao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper proposes a hierarchical, fine-grained and interpretable latent variable model for
prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution …

被引用次数：152 相关文章所有 5 个版本

[PDF] arxiv.org

Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions

G Liu, Y Zhang, Y Lei, Y Chen, R Wang, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Style transfer TTS has shown impressive performance in recent years. However, style
control is often restricted to systems built on expressive speech recordings with discrete style …

被引用次数：33 相关文章所有 4 个版本

[PDF] arxiv.org

Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Y Lei, S Yang, X Wang, L Xie - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org

Expressive synthetic speech is essential for many human-computer interaction and audio
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …

被引用次数：71 相关文章所有 4 个版本

[PDF] arxiv.org

Controllable emotion transfer for end-to-end speech synthesis

T Li, S Yang, L Xue, L Xie - 2021 12th International Symposium …, 2021 - ieeexplore.ieee.org

Emotion embedding space learned from references is a straight-forward approach for
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …

被引用次数：87 相关文章所有 3 个版本

[PDF] ieee.org

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

被引用次数：51 相关文章所有 7 个版本

[PDF] ieee.org

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

被引用次数：45 相关文章所有 7 个版本

[PDF] arxiv.org

Emotional speech synthesis with rich and granularized control

SY Um, S Oh, K Byun, I Jang, CH Ahn… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper proposes an effective emotion control method for an end-to-end text-to-speech
(TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is …

被引用次数：101 相关文章所有 5 个版本