An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

An overview of voice conversion systems

SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier
Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …

Cyclegan-vc2: Improved cyclegan-based non-parallel voice conversion

T Kaneko, H Kameoka, K Tanaka… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
Non-parallel voice conversion (VC) is a technique for learning the mapping from source to
target speech without relying on parallel data. This is an important task, but it has been …

Cyclegan-vc: Non-parallel voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - 2018 26th European Signal …, 2018 - ieeexplore.ieee.org
We propose a non-parallel voice-conversion (VC) method that can learn a mapping from
source to target speech without relying on parallel data. The proposed method is particularly …

Parallel-data-free voice conversion using cycle-consistent adversarial networks

T Kaneko, H Kameoka - arXiv preprint arXiv:1711.11293, 2017 - arxiv.org
We propose a parallel-data-free voice-conversion (VC) method that can learn a mapping
from source to target speech without relying on parallel data. The proposed method is …

Stargan-vc2: Rethinking conditional methods for stargan-based voice conversion

T Kaneko, H Kameoka, K Tanaka, N Hojo - arXiv preprint arXiv …, 2019 - arxiv.org
Non-parallel multi-domain voice conversion (VC) is a technique for learning mappings
among multiple domains without relying on parallel data. This is important but challenging …

Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory

T Toda, AW Black, K Tokuda - IEEE Transactions on Audio …, 2007 - ieeexplore.ieee.org
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A
Gaussian mixture model (GMM) of the joint probability density of source and target features …

Learning latent representations for speech generation and transformation

WN Hsu, Y Zhang, J Glass - arXiv preprint arXiv:1704.04222, 2017 - arxiv.org
An ability to model a generative process and learn a latent representation for speech in an
unsupervised fashion will be crucial to process vast quantities of unlabelled speech data …

Spectral mapping using artificial neural networks for voice conversion

S Desai, AW Black, B Yegnanarayana… - IEEE Transactions on …, 2010 - ieeexplore.ieee.org
In this paper, we use artificial neural networks (ANNs) for voice conversion and exploit the
mapping abilities of an ANN model to perform mapping of spectral features of a source …

AttS2S-VC: Sequence-to-sequence voice conversion with attention and context preservation mechanisms

K Tanaka, H Kameoka, T Kaneko… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper describes a method based on a sequence-to-sequence learning (Seq2Seq) with
attention and context preservation mechanism for voice conversion (VC) tasks. Seq2Seq …