[HTML][HTML] Overview of voice conversion methods based on deep learning

T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …

[HTML][HTML] Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors

A Firc, K Malinka, P Hanáček - Heliyon, 2023 - cell.com
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …

Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges

H Liz-Lopez, M Keita, A Taleb-Ahmed, A Hadid… - Information …, 2024 - Elsevier
Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …

Pmvc: Data augmentation-based prosody modeling for expressive voice conversion

Y Deng, H Tang, X Zhang, J Wang, N Cheng… - Proceedings of the 31st …, 2023 - dl.acm.org
Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …

Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis

Y Zhou, C Song, X Li, L Zhang, Z Wu, Y Bian… - arXiv preprint arXiv …, 2022 - arxiv.org
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …

Towards General-Purpose Text-Instruction-Guided Voice Conversion

CY Kuan, CA Li, TY Hsu, TY Lin… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …

Any-to-Any Voice Conversion with F0 and Timbre Disentanglement and Novel Timbre Conditioning

S Kovela, R Valle, A Dantrey… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Despite recent advances in voice conversion (VC), it is still challenging to do real-time one-
shot voice conversion with good control over timbre and F 0. In this work, we present a PPG …

[PDF][PDF] E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion

T Okamoto, T Toda, H Kawai - Proc. Interspeech, 2023 - okamotocamera.com
E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion Page 1 E2E-S2S-VC:
End-to-end sequence-to-sequence voice conversion Takuma Okamoto1, Tomoki Toda2,1 …

Preserving background sound in noise-robust voice conversion via multi-task learning

J Yao, Y Lei, Q Wang, P Guo, Z Ning… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Background sound is an informative form of art that is helpful in providing a more immersive
experience in real-application voice conversion (VC) scenarios. However, prior research …

Slmgan: Exploiting speech language model representations for unsupervised zero-shot voice conversion in gans

YA Li, C Han, N Mesgarani - … of Signal Processing to Audio and …, 2023 - ieeexplore.ieee.org
In recent years, large-scale pre-trained speech language models (SLMs) have
demonstrated remarkable advancements in various generative speech modeling …