An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

Emomix: Emotion mixing via diffusion models for emotional speech synthesis

H Tang, X Zhang, J Wang, N Cheng, J Xiao - arXiv preprint arXiv …, 2023 - arxiv.org
There has been significant progress in emotional Text-To-Speech (TTS) synthesis
technology in recent years. However, existing methods primarily focus on the synthesis of a …

Stargan for emotional speech conversion: Validated by data augmentation of end-to-end emotion recognition

G Rizos, A Baird, M Elliott… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
In this paper, we propose an adversarial network implementation for speech emotion
conversion as a data augmentation method, validated by a multi-class speech affect …

Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis

Y Lei, S Yang, X Wang, L Xie - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Expressive synthetic speech is essential for many human-computer interaction and audio
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …

Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model

C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …

Speech synthesis with mixed emotions

K Zhou, B Sisman, R Rana… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …

Controllable emotion transfer for end-to-end speech synthesis

T Li, S Yang, L Xue, L Xie - 2021 12th International Symposium …, 2021 - ieeexplore.ieee.org
Emotion embedding space learned from references is a straight-forward approach for
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …

Reinforcement learning for emotional text-to-speech synthesis with improved emotion discriminability

R Liu, B Sisman, H Li - arXiv preprint arXiv:2104.01408, 2021 - arxiv.org
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
However, the generated voice is often not perceptually identifiable by its intended emotion …

Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis

T Li, X Wang, Q Xie, Z Wang… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
The cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims
to synthesize speech for a target speaker with the emotion transferred from reference …

Controlling emotion strength with relative attribute for end-to-end speech synthesis

X Zhu, S Yang, G Yang, L Xie - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
Recently, attention-based end-to-end speech synthesis has achieved superior performance
compared to traditional speech synthesis models, and several approaches like global style …