An overview of affective speech synthesis and conversion in the deep learning era
A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …
a core priority in human–computer interaction research. In recent years, machines have …
Emomix: Emotion mixing via diffusion models for emotional speech synthesis
There has been significant progress in emotional Text-To-Speech (TTS) synthesis
technology in recent years. However, existing methods primarily focus on the synthesis of a …
technology in recent years. However, existing methods primarily focus on the synthesis of a …
Stargan for emotional speech conversion: Validated by data augmentation of end-to-end emotion recognition
In this paper, we propose an adversarial network implementation for speech emotion
conversion as a data augmentation method, validated by a multi-class speech affect …
conversion as a data augmentation method, validated by a multi-class speech affect …
Msemotts: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Expressive synthetic speech is essential for many human-computer interaction and audio
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …
broadcast scenarios, and thus synthesizing expressive speech has attracted much attention …
Emovie: A mandarin emotion speech dataset with a simple emotional text-to-speech model
Recently, there has been an increasing interest in neural speech synthesis. While the deep
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …
neural network achieves the state-of-the-art result in text-to-speech (TTS) tasks, how to …
Speech synthesis with mixed emotions
Emotional speech synthesis aims to synthesize human voices with various emotional effects.
The current studies are mostly focused on imitating an averaged style belonging to a specific …
The current studies are mostly focused on imitating an averaged style belonging to a specific …
Controllable emotion transfer for end-to-end speech synthesis
Emotion embedding space learned from references is a straight-forward approach for
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …
emotion transfer in encoder-decoder structured emotional text to speech (TTS) systems …
Reinforcement learning for emotional text-to-speech synthesis with improved emotion discriminability
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
However, the generated voice is often not perceptually identifiable by its intended emotion …
However, the generated voice is often not perceptually identifiable by its intended emotion …
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis
The cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims
to synthesize speech for a target speaker with the emotion transferred from reference …
to synthesize speech for a target speaker with the emotion transferred from reference …
Controlling emotion strength with relative attribute for end-to-end speech synthesis
Recently, attention-based end-to-end speech synthesis has achieved superior performance
compared to traditional speech synthesis models, and several approaches like global style …
compared to traditional speech synthesis models, and several approaches like global style …