Tgavc: Improving autoencoder voice conversion with text-guided and adversarial training

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Voice Conversion (VC) refers to changing the timbre of a speech while retaining the
discourse content. Recently, many works have focused on disentangle-based learning …

被引用次数：54 相关文章所有 3 个版本

[PDF] arxiv.org

Drvc: A framework of any-to-any voice conversion with self-supervised learning

Q Wang, X Zhang, J Wang, N Cheng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Any-to-any voice conversion problem aims to convert voices for source and target speakers,
which are out of the training data. Previous works wildly utilize the disentangle-based …

被引用次数：27 相关文章所有 4 个版本

[HTML] nih.gov

Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models

YA Li, C Han, N Mesgarani - 2022 IEEE Spoken Language …, 2023 - ieeexplore.ieee.org

One-shot voice conversion (VC) aims to convert speech from any source speaker to an
arbitrary target speaker with only a few seconds of reference speech from the target speaker …

被引用次数：14 相关文章所有 6 个版本

[PDF] arxiv.org

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org

Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Zero-shot voice conditioning for denoising diffusion tts models

A Levkovitch, E Nachmani, L Wolf - arXiv preprint arXiv:2206.02246, 2022 - arxiv.org

We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Tdass: Target domain adaptation speech synthesis framework for multi-speaker low-resource tts

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022 - ieeexplore.ieee.org

Recently, synthesizing personalized speech by text-to-speech (TTS) application is highly
demanded. But the previous TTS models require a mass of target speaker speeches for …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Susing: Su-net for singing voice synthesis

X Zhang, J Wang, N Cheng… - 2022 International Joint …, 2022 - ieeexplore.ieee.org

Singing voice synthesis is a generative task that involves multi-dimensional control of the
singing model, including lyrics, pitch, and duration, and includes the timbre of the singer and …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Learning Disentangled Speech Representations with Contrastive Learning and Time-Invariant Retrieval

Y Deng, H Tang, X Zhang, N Cheng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Voice conversion refers to transferring speaker identity with well-preserved content. Better
disentanglement of speech representations leads to better voice conversion. Recent studies …

被引用次数：3 相关文章所有 4 个版本

[PDF] largeaudiomodel.com

Vq-cl: Learning disentangled speech representations with contrastive learning and vector quantization

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Voice Conversion (VC) refers to converting the voice characteristics of audio to another one
as it is said by other people. Recently, more and more studies have focused on disentangle …

被引用次数：7 相关文章

[PDF] largeaudiomodel.com

Learning speech representations with flexible hidden feature dimensions

H Tang, X Zhang, J Wang, N Cheng… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Non-parallel many-to-many voice conversion is a kind of style transfer task in speech.
Recently, AutoVC has been applied in this field as a popular solution, as it can achieve …

被引用次数：5 相关文章