Nvc-net: End-to-end adversarial voice conversion

T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com

Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

被引用次数：18 相关文章所有 5 个版本

[HTML] cell.com Full View

[HTML][HTML] Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com

Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

被引用次数：15 相关文章所有 7 个版本

Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges

H Liz-Lopez, M Keita, A Taleb-Ahmed, A Hadid… - Information …, 2024 - Elsevier

Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org

Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

Y Zhou, C Song, X Li, L Zhang, Z Wu, Y Bian… - arXiv preprint arXiv …, 2022 - arxiv.org

Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

被引用次数：4 相关文章所有 3 个版本

Any-to-Any Voice Conversion with F₀ and Timbre Disentanglement and Novel Timbre Conditioning

S Kovela, R Valle, A Dantrey… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Despite recent advances in voice conversion (VC), it is still challenging to do real-time one-
shot voice conversion with good control over timbre and F 0. In this work, we present a PPG …

被引用次数：8 相关文章

[PDF] okamotocamera.com

[PDF][PDF] E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion

T Okamoto, T Toda, H Kawai - Proc. Interspeech, 2023 - okamotocamera.com

E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion Page 1 E2E-S2S-VC:
End-to-end sequence-to-sequence voice conversion Takuma Okamoto1, Tomoki Toda2,1 …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Preserving background sound in noise-robust voice conversion via multi-task learning

J Yao, Y Lei, Q Wang, P Guo, Z Ning… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Background sound is an informative form of art that is helpful in providing a more immersive
experience in real-application voice conversion (VC) scenarios. However, prior research …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Slmgan: Exploiting speech language model representations for unsupervised zero-shot voice conversion in gans

YA Li, C Han, N Mesgarani - … of Signal Processing to Audio and …, 2023 - ieeexplore.ieee.org

In recent years, large-scale pre-trained speech language models (SLMs) have
demonstrated remarkable advancements in various generative speech modeling …

被引用次数：2 相关文章所有 3 个版本