An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …
conversion, we change the speaker identity from one to another, while keeping the linguistic …
The attacker's perspective on automatic speaker verification: An overview
Security of automatic speaker verification (ASV) systems is compromised by various
spoofing attacks. While many types of non-proactive attacks (and their defenses) have been …
spoofing attacks. While many types of non-proactive attacks (and their defenses) have been …
Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …
Transfer learning from speech synthesis to voice conversion with non-parallel training data
We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …
Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling
This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …
Nautilus: a versatile voice cloning system
HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …
with a target voice either from a text input or a reference utterance of an arbitrary source …
Ace-vc: Adaptive and controllable voice conversion using explicitly disentangled self-supervised speech representations
In this work, we propose a zero-shot voice conversion method using speech representations
trained with self-supervised learning. First, we develop a multi-task model to decompose a …
trained with self-supervised learning. First, we develop a multi-task model to decompose a …
Language agnostic speaker embedding for cross-lingual personalized speech generation
Cross-lingual personalized speech generation seeks to synthesize a target speaker's voice
from only a few training samples that are in a different language. One popular technique is to …
from only a few training samples that are in a different language. One popular technique is to …
Voice conversion for whispered speech synthesis
We present an approach to synthesize whisper by applying a handcrafted signal processing
recipe and Voice Conversion (VC) techniques to convert normally phonated speech to …
recipe and Voice Conversion (VC) techniques to convert normally phonated speech to …
An improved stargan for emotional voice conversion: Enhancing voice quality and data augmentation
Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech
signal to a target style while preserving its content and speaker identity information. Previous …
signal to a target style while preserving its content and speaker identity information. Previous …