Avqvc: One-shot voice conversion by vector quantization with applying contrastive learning

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Voice Conversion (VC) refers to changing the timbre of a speech while retaining the
discourse content. Recently, many works have focused on disentangle-based learning …

Drvc: A framework of any-to-any voice conversion with self-supervised learning

Q Wang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org
One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org
Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …

Zero-shot voice conditioning for denoising diffusion tts models

A Levkovitch, E Nachmani, L Wolf - arXiv preprint arXiv:2206.02246, 2022 - arxiv.org
We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …

Tdass: Target domain adaptation speech synthesis framework for multi-speaker low-resource tts

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly
demanded. But the previous TTS models require a mass of target speaker speeches for …

Susing: Su-net for singing voice synthesis

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
Singing voice synthesis is a generative task that involves multi-dimensional control of the
singing model, including lyrics, pitch, and duration, and includes the timbre of the singer and …

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

Y Deng, H Tang, X Zhang, N Cheng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Voice conversion refers to transferring speaker identity with well-preserved content. Better
disentanglement of speech representations leads to better voice conversion. Recent studies …

Vq-cl: Learning disentangled speech representations with contrastive learning and vector quantization

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Voice Conversion (VC) refers to converting the voice characteristics of audio to another one
as it is said by other people. Recently, more and more studies have focused on disentangle …

Learning speech representations with flexible hidden feature dimensions

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Non-parallel many-to-many voice conversion is a kind of style transfer task in speech.
Recently, AutoVC has been applied in this field as a popular solution, as it can achieve …