Conventional and contemporary approaches used in text to speech synthesis: A review

N Kaur, P Singh - Artificial Intelligence Review, 2023 - Springer
Nowadays speech synthesis or text to speech (TTS), an ability of system to produce human
like natural sounding voice from the written text, is gaining popularity in the field of speech …

A review of deep learning based speech synthesis

Y Ning, S He, Z Wu, C Xing, LJ Zhang - Applied Sciences, 2019 - mdpi.com
Speech synthesis, also known as text-to-speech (TTS), has attracted increasingly more
attention. Recent advances on speech synthesis are overwhelmingly contributed by deep …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis

G Sun, Y Zhang, RJ Weiss, Y Cao… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper proposes a hierarchical, fine-grained and interpretable latent variable model for
prosody based on the Tacotron 2 text-to-speech model. It achieves multi-resolution …

Promptstyle: Controllable style transfer for text-to-speech with natural language descriptions

G Liu, Y Zhang, Y Lei, Y Chen, R Wang, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Style transfer TTS has shown impressive performance in recent years. However, style
control is often restricted to systems built on expressive speech recordings with discrete style …

Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Y Lei, S Yang, X Wang, L Xie - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Expressive synthetic speech is essential for many human-computer interaction and audio
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …

Controllable emotion transfer for end-to-end speech synthesis

T Li, S Yang, L Xue, L Xie - 2021 12th International Symposium …, 2021 - ieeexplore.ieee.org
Emotion embedding space learned from references is a straight-forward approach for
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …

Emotion intensity and its control for emotional voice conversion

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional voice conversion (EVC) seeks to convert the emotional state of an utterance while
preserving the linguistic content and speaker identity. In EVC, emotions are usually treated …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Emotional speech synthesis with rich and granularized control

SY Um, S Oh, K Byun, I Jang, CH Ahn… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper proposes an effective emotion control method for an end-to-end text-to-speech
(TTS) system. To flexibly control the distinct characteristic of a target emotion category, it is …