[HTML][HTML] Overview of voice conversion methods based on deep learning
T Walczyna, Z Piotrowski - Applied sciences, 2023 - mdpi.com
Voice conversion is a process where the essence of a speaker's identity is seamlessly
transferred to another speaker, all while preserving the content of their speech. This usage is …
transferred to another speaker, all while preserving the content of their speech. This usage is …
[HTML][HTML] Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …
learning make deepfakes highly believable, and very difficult to differentiate between what is …
Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges
Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …
the advantages, the applications to disinformation are concerning as the counter-measures …
Pmvc: Data augmentation-based prosody modeling for expressive voice conversion
Voice conversion as the style transfer task applied to speech, refers to converting one
person's speech into a new speech that sounds like another person's. Up to now, there has …
person's speech into a new speech that sounds like another person's. Up to now, there has …
Content-dependent fine-grained speaker embedding for zero-shot speaker adaptation in text-to-speech synthesis
Zero-shot speaker adaptation aims to clone an unseen speaker's voice without any
adaptation time and parameters. Previous researches usually use a speaker encoder to …
adaptation time and parameters. Previous researches usually use a speaker encoder to …
Towards General-Purpose Text-Instruction-Guided Voice Conversion
This paper introduces a novel voice conversion (VC) model, guided by text instructions such
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …
as “articulate slowly with a deep tone “or “speak in a cheerful boyish voice”. Unlike …
Any-to-Any Voice Conversion with F0 and Timbre Disentanglement and Novel Timbre Conditioning
S Kovela, R Valle, A Dantrey… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Despite recent advances in voice conversion (VC), it is still challenging to do real-time one-
shot voice conversion with good control over timbre and F 0. In this work, we present a PPG …
shot voice conversion with good control over timbre and F 0. In this work, we present a PPG …
[PDF][PDF] E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion
E2E-S2S-VC: End-to-end sequence-to-sequence voice conversion Page 1 E2E-S2S-VC:
End-to-end sequence-to-sequence voice conversion Takuma Okamoto1, Tomoki Toda2,1 …
End-to-end sequence-to-sequence voice conversion Takuma Okamoto1, Tomoki Toda2,1 …
Preserving background sound in noise-robust voice conversion via multi-task learning
Background sound is an informative form of art that is helpful in providing a more immersive
experience in real-application voice conversion (VC) scenarios. However, prior research …
experience in real-application voice conversion (VC) scenarios. However, prior research …
Slmgan: Exploiting speech language model representations for unsupervised zero-shot voice conversion in gans
In recent years, large-scale pre-trained speech language models (SLMs) have
demonstrated remarkable advancements in various generative speech modeling …
demonstrated remarkable advancements in various generative speech modeling …