A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

Investigating on incorporating pretrained and learnable speaker representations for multi-speaker multi-style text-to-speech

CM Chien, JH Lin, C Huang, P Hsu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The few-shot multi-speaker multi-style voice cloning task is to synthesize utterances with
voice and speaking style similar to a reference speaker given only a few reference samples …

Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis

T Li, X Wang, Q Xie, Z Wang… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
The cross-speaker emotion transfer task in text-to-speech (TTS) synthesis particularly aims
to synthesize speech for a target speaker with the emotion transferred from reference …

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org
Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …

Voice filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

A Gabryś, G Huybrechts, MS Ribeiro… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data
to generate high-quality synthetic speech. When using reduced amounts of training data …

Dgc-vector: A new speaker embedding for zero-shot voice conversion

R Xiao, H Zhang, Y Lin - ICASSP 2022-2022 IEEE International …, 2022 - ieeexplore.ieee.org
Recently, more and more zero-shot voice conversion algorithms have been proposed. As a
fundamental part of zero-shot voice conversion, speaker embeddings are the key to …

[PDF][PDF] A Unified System for Voice Cloning and Voice Conversion through Diffusion Probabilistic Modeling.

T Sadekova, V Gogoryan, I Vovk, V Popov… - …, 2022 - isca-archive.org
Text-to-speech and voice conversion are two common speech generation tasks typically
solved using different models. In this paper, we present a novel approach to voice cloning …

Prosody and voice factorization for few-shot speaker adaptation in the challenge m2voc 2021

T Wang, R Fu, J Yi, J Tao, Z Wen… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The paper describes the CASIA speech synthesis system entry for challenge M2VoC 2021.
The low similarity and naturalness of synthesized speech remains a challenging problem for …

Neural fusion for voice cloning

B Chen, C Du, K Yu - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Voice cloning is a technique to build text-to-speech applications for individuals. When only
very limited training data is available, it is challenging to preserve both high speech quality …

Unet-tts: Improving unseen speaker and style transfer in one-shot voice cloning

R Li, D Pu, M Huang, B Huang - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
One-shot voice cloning aims to transform speaker voice and speaking style in speech
synthesized from a text-to-speech (TTS) system, where only a shot recording from the target …