Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

An overview of deep-learning-based audio-visual speech enhancement and separation

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org
Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

ASVspoof 2019: A large-scale public database of synthesized, converted and replayed speech

X Wang, J Yamagishi, M Todisco, H Delgado… - Computer Speech & …, 2020 - Elsevier
Automatic speaker verification (ASV) is one of the most natural and convenient means of
biometric person recognition. Unfortunately, just like all other biometric systems, ASV is …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai

C Zhang, C Zhang, S Zheng, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …

The voice conversion challenge 2018: Promoting development of parallel and nonparallel methods

J Lorenzo-Trueba, J Yamagishi, T Toda, D Saito… - arXiv preprint arXiv …, 2018 - arxiv.org
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016
edition with the aim of providing a common framework for evaluating and comparing …

Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks

CC Hsu, HT Hwang, YC Wu, Y Tsao… - arXiv preprint arXiv …, 2017 - arxiv.org
Building a voice conversion (VC) system from non-parallel speech corpora is challenging
but highly valuable in real application scenarios. In most situations, the source and the target …

ASVspoof: the automatic speaker verification spoofing and countermeasures challenge

Z Wu, J Yamagishi, T Kinnunen… - IEEE Journal of …, 2017 - ieeexplore.ieee.org
Concerns regarding the vulnerability of automatic speaker verification (ASV) technology
against spoofing can undermine confidence in its reliability and form a barrier to exploitation …