An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

A survey on voice assistant security: Attacks and countermeasures

C Yan, X Ji, K Wang, Q Jiang, Z Jin, W Xu - ACM Computing Surveys, 2022 - dl.acm.org
Voice assistants (VA) have become prevalent on a wide range of personal devices such as
smartphones and smart speakers. As companies build voice assistants with extra …

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arXiv preprint arXiv …, 2020 - arxiv.org
The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

Disentangling voice and content with self-supervision for speaker recognition

T Liu, KA Lee, Q Wang, H Li - Advances in Neural …, 2023 - proceedings.neurips.cc
For speaker recognition, it is difficult to extract an accurate speaker representation from
speech because of its mixture of speaker traits and content. This paper proposes a …

Nautilus: a versatile voice cloning system

HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org
We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …

[PDF][PDF] Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams.

G Zhao, S Ding, R Gutierrez-Osuna - Interspeech, 2019 - isca-archive.org
Methods for foreign accent conversion (FAC) aim to generate speech that sounds similar to
a given non-native speaker but with the accent of a native speaker. Conventional FAC …

Pretraining techniques for sequence-to-sequence voice conversion

WC Huang, T Hayashi, YC Wu… - … /ACM Transactions on …, 2021 - ieeexplore.ieee.org
Sequence-to-sequence (seq2seq) voice conversion (VC) models are attractive owing to
their ability to convert prosody. Nonetheless, without sufficient data, seq2seq VC models can …

Duration controllable voice conversion via phoneme-based information bottleneck

SH Lee, HR Noh, WJ Nam… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Several voice conversion (VC) methods using a simple autoencoder with a carefully
designed information bottleneck have recently been studied. In general, they extract content …

Accentron: Foreign accent conversion to arbitrary non-native speakers using zero-shot learning

S Ding, G Zhao, R Gutierrez-Osuna - Computer Speech & Language, 2022 - Elsevier
Foreign accent conversion (FAC) aims to create a new voice that has the voice identity of a
given second-language (L2) speaker but with a native (L1) accent. Previous FAC …

SINGAN: Singing voice conversion with generative adversarial networks

B Sisman, K Vijayan, M Dong… - 2019 Asia-Pacific Signal …, 2019 - ieeexplore.ieee.org
Singing voice conversion (SVC) is a task to convert the source singer's voice to sound like
that of the target singer, without changing the lyrical content. So far, most of the voice …