Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …
conversion, we change the speaker identity from one to another, while keeping the linguistic …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
An overview of deep-learning-based audio-visual speech enhancement and separation
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …
extract either one or more target speech signals, respectively, from a mixture of sounds …
ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech
Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …
[PDF][PDF] Wavenet: A generative model for raw audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …
speech synthesis is an interesting direction. With the diffusion model as the most popular …
The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016
edition with the aim of providing a common framework for evaluating and comparing …
edition with the aim of providing a common framework for evaluating and comparing …
Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks
Building a voice conversion (VC) system from non-parallel speech corpora is challenging
but highly valuable in real application scenarios. In most situations, the source and the target …
but highly valuable in real application scenarios. In most situations, the source and the target …
ASVspoof: the automatic speaker verification spoofing and countermeasures challenge
Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …