Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
An overview of voice conversion and its challenges: From statistical modeling to deep learning
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …
conversion, we change the speaker identity from one to another, while keeping the linguistic …
A survey on audio diffusion models: Text to speech synthesis and enhancement in generative ai
Generative AI has demonstrated impressive performance in various fields, among which
speech synthesis is an interesting direction. With the diffusion model as the most popular …
speech synthesis is an interesting direction. With the diffusion model as the most popular …
Spoofing and countermeasures for speaker verification: A survey
While biometric authentication has advanced significantly in recent years, evidence shows
the technology can be susceptible to malicious spoofing attacks. The research community …
the technology can be susceptible to malicious spoofing attacks. The research community …
Audit: Audio editing by following instructions with latent diffusion models
Audio editing is applicable for various purposes, such as adding background sound effects,
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion …
replacing a musical instrument, and repairing damaged audio. Recently, some diffusion …
Data augmentation for deep neural network acoustic modeling
This paper investigates data augmentation for deep neural network acoustic modeling
based on label-preserving transformations to deal with data sparsity. Two data …
based on label-preserving transformations to deal with data sparsity. Two data …
An overview of voice conversion systems
SH Mohammadi, A Kain - Speech Communication, 2017 - Elsevier
Voice transformation (VT) aims to change one or more aspects of a speech signal while
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …
preserving linguistic information. A subset of VT, Voice conversion (VC) specifically aims to …
Continuous probabilistic transform for voice conversion
Y Stylianou, O Cappé… - IEEE Transactions on …, 1998 - ieeexplore.ieee.org
Voice conversion, as considered in this paper, is defined as modifying the speech signal of
one speaker (source speaker) so that it sounds as if it had been pronounced by a different …
one speaker (source speaker) so that it sounds as if it had been pronounced by a different …
Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
In this paper, we describe a novel spectral conversion method for voice conversion (VC). A
Gaussian mixture model (GMM) of the joint probability density of source and target features …
Gaussian mixture model (GMM) of the joint probability density of source and target features …
Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
Hidden Markov models (HMMs) and Gaussian mixture models (GMMs) are the two most
common types of acoustic models used in statistical parametric approaches for generating …
common types of acoustic models used in statistical parametric approaches for generating …