A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Bigvgan: A universal neural vocoder with large-scale training
Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …
the model generates raw waveform conditioned on acoustic features, it is challenging to …
Univnet: A neural vocoder with multi-resolution spectrogram discriminators for high-fidelity waveform generation
Most neural vocoders employ band-limited mel-spectrograms to generate waveforms. If full-
band spectral features are used as the input, the vocoder can be provided with as much …
band spectral features are used as the input, the vocoder can be provided with as much …
[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve
Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …
extensively being harnessed across a diverse range of domains, eg, forensic science …
iSTFTNet: Fast and lightweight mel-spectrogram vocoder incorporating inverse short-time Fourier transform
In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is
commonly applied as an intermediate representation, and the necessity for a mel …
commonly applied as an intermediate representation, and the necessity for a mel …
CFAD: A Chinese dataset for fake audio detection
Fake audio detection is a growing concern and some relevant datasets have been designed
for research. However, there is no standard public Chinese dataset under complex …
for research. However, there is no standard public Chinese dataset under complex …
Espnet2-tts: Extending the edge of tts research
This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit.
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …
ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features …
Avocodo: Generative adversarial network for artifact-free vocoder
Neural vocoders based on the generative adversarial neural network (GAN) have been
widely used due to their fast inference speed and lightweight networks while generating …
widely used due to their fast inference speed and lightweight networks while generating …
Safeear: Content privacy-preserving audio deepfake detection
Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited remarkable
performance in generating realistic and natural audio. However, their dark side, audio …
performance in generating realistic and natural audio. However, their dark side, audio …
[PDF][PDF] SVTS: scalable video-to-speech synthesis
Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip
movements into the corresponding audio. This task has received an increasing amount of …
movements into the corresponding audio. This task has received an increasing amount of …