An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …
conversion, we change the speaker identity from one to another, while keeping the linguistic …
Deepfakes as a threat to a speaker and facial recognition: An overview of tools and attack vectors
Deepfakes present an emerging threat in cyberspace. Recent developments in machine
learning make deepfakes highly believable, and very difficult to differentiate between what is …
learning make deepfakes highly believable, and very difficult to differentiate between what is …
Towards end-to-end synthetic speech detection
The constant Q transform (CQT) has been shown to be one of the most effective speech
signal pre-transforms to facilitate synthetic speech detection, followed by either hand-crafted …
signal pre-transforms to facilitate synthetic speech detection, followed by either hand-crafted …
Transfer learning from speech synthesis to voice conversion with non-parallel training data
We present a novel voice conversion (VC) framework by learning from a text-to-speech
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …
(TTS) synthesis system, that is called TTS-VC transfer learning or TTL-VC for short. We first …
Significance of subband features for synthetic speech detection
In text-to-speech or voice conversion based synthetic speech detection, it is a common
practice that spectral information over the entire frequency band is used for feature …
practice that spectral information over the entire frequency band is used for feature …
Cross-lingual voice conversion with bilingual phonetic posteriorgram and average modeling
This paper presents a cross-lingual voice conversion approach using bilingual Phonetic
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …
PosteriorGram (PPG) and average modeling. The proposed approach makes use of …
Modified magnitude-phase spectrum information for spoofing detection
Most of the existing feature representations for spoofing countermeasures consider
information either from the magnitude or phase spectrum. We hypothesize that both …
information either from the magnitude or phase spectrum. We hypothesize that both …
ASSD: Synthetic Speech Detection in the AAC Compressed Domain
Synthetic human speech signals have become very easy to generate given modern text-to-
speech methods. When these signals are shared on social media they are often …
speech methods. When these signals are shared on social media they are often …
Unsupervised representation disentanglement using cross domain features and adversarial learning in variational autoencoder based voice conversion
An effective approach for voice conversion (VC) is to disentangle linguistic content from
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …
other components in the speech signal. The effectiveness of variational autoencoder (VAE) …
Extraction of octave spectra information for spoofing attack detection
This article focuses on extracting information from the octave power spectra of long-term
constant-Q transform (CQT) for spoofing attack detection. A novel framework based on multi …
constant-Q transform (CQT) for spoofing attack detection. A novel framework based on multi …