Average Modeling Approach to Voice Conversion with Non-Parallel Data.

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org

Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

被引用次数：371 相关文章所有 8 个版本

[PDF] arxiv.org

The attacker's perspective on automatic speaker verification: An overview

RK Das, X Tian, T Kinnunen, H Li - arXiv preprint arXiv:2004.08849, 2020 - arxiv.org

Security of automatic speaker verification (ASV) systems is compromised by various
spoofing attacks. While many types of non-proactive attacks (and their defenses) have been …

被引用次数：79 相关文章所有 7 个版本

[PDF] arxiv.org

Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion

Y Zhao, WC Huang, X Tian, J Yamagishi… - arXiv preprint arXiv …, 2020 - arxiv.org

The voice conversion challenge is a bi-annual scientific event held to compare and
understand different voice conversion (VC) systems built on a common dataset. In 2020, we …

被引用次数：231 相关文章所有 10 个版本

[PDF] ieee.org

Transfer learning from speech synthesis to voice conversion with non-parallel training data

M Zhang, Y Zhou, L Zhao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …

被引用次数：59 相关文章所有 5 个版本

[PDF] researchgate.net

Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling

Y Zhou, X Tian, H Xu, RK Das… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …

被引用次数：91 相关文章所有 5 个版本

[PDF] ieee.org

Nautilus: a versatile voice cloning system

HT Luong, J Yamagishi - IEEE/ACM Transactions on Audio …, 2020 - ieeexplore.ieee.org

We introduce a novel speech synthesis system, called NAUTILUS, that can generate speech
with a target voice either from a text input or a reference utterance of an arbitrary source …

被引用次数：57 相关文章所有 5 个版本

[PDF] arxiv.org

Ace-vc: Adaptive and controllable voice conversion using explicitly disentangled self-supervised speech representations

S Hussain, P Neekhara, J Huang, J Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

In this work, we propose a zero-shot voice conversion method using speech representations
trained with self-supervised learning. First, we develop a multi-task model to decompose a …

被引用次数：15 相关文章所有 4 个版本

[PDF] ieee.org

Language agnostic speaker embedding for cross-lingual personalized speech generation

Y Zhou, X Tian, H Li - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org

Cross-lingual personalized speech generation seeks to synthesize a target speaker's voice
from only a few training samples that are in a different language. One popular technique is to …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Voice conversion for whispered speech synthesis

M Cotescu, T Drugman, G Huybrechts… - IEEE Signal …, 2019 - ieeexplore.ieee.org

We present an approach to synthesize whisper by applying a handcrafted signal processing
recipe and Voice Conversion (VC) techniques to convert normally phonated speech to …

被引用次数：35 相关文章所有 8 个版本

[PDF] arxiv.org

An improved stargan for emotional voice conversion: Enhancing voice quality and data augmentation

X He, J Chen, G Rizos, BW Schuller - arXiv preprint arXiv:2107.08361, 2021 - arxiv.org

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech
signal to a target style while preserving its content and speaker identity information. Previous …

被引用次数：17 相关文章所有 9 个版本