An overview of affective speech synthesis and conversion in the deep learning era

A Triantafyllopoulos, BW Schuller… - Proceedings of the …, 2023 - ieeexplore.ieee.org
Speech is the fundamental mode of human communication, and its synthesis has long been
a core priority in human–computer interaction research. In recent years, machines have …

iEmoTTS: Toward robust cross-speaker emotion transfer and control for speech synthesis based on disentanglement between prosody and timbre

G Zhang, Y Qin, W Zhang, J Wu, M Li… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Cross-speaker emotion transfer is a common approach to generating emotional speech
when speech data with emotion labels from target speakers is not available. This paper …

Disentangling prosody representations with unsupervised speech reconstruction

L Qu, T Li, C Weber, T Pekarek-Rosin… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Human speech can be characterized by different components, including semantic content,
speaker identity and prosodic information. Significant progress has been made in …

Applying the information bottleneck principle to prosodic representation learning

G Zhang, Y Qin, D Tan, T Lee - arXiv preprint arXiv:2108.02821, 2021 - arxiv.org
This paper describes a novel design of a neural network-based speech generation model
for learning prosodic representation. The problem of representation learning is formulated …

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, MS Ribeiro, B Tura-Vecino… - arXiv preprint arXiv …, 2023 - arxiv.org
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …

Estimation of Hazardous Environments Through Speech and Ambient Noise Analysis.

AV Porco, K Dongshik - International Journal of Advanced …, 2023 - search.ebscohost.com
In recent years, significant attention has been directed towards the development of artificial
empathy within the engineering academic community. Replicating artificial empathy …

Enhancing Emotion Classification Through Speech and Correlated Emotional Sounds via a Variational Auto-Encoder Model with Prosodic Regularization

AV Porco, D Kang - 2023 IEEE International Conference on …, 2023 - ieeexplore.ieee.org
Recent studies have explored the development of empathetic systems, capable of engaging
in human-like communication and support in daily life tasks. This envision primarily requires …

Analysis of Acoustic Correlates of Marathi Prosodic Features for Human-Machine Interaction

TK Harhare, M Shah - 2022 International Conference on …, 2022 - ieeexplore.ieee.org
The prosodic feature study gives insight into the suprasegmental properties of speech.
Researchers have been studying acoustic features of prosody worldwide to design a model …