Deep learning: Applications, architectures, models, tools, and frameworks: A comprehensive survey

M Gheisari, F Ebrahimzadeh, M Rahimi… - CAAI Transactions …, 2023 - Wiley Online Library
Deep Learning (DL) is a subfield of machine learning that significantly impacts extracting
new knowledge. By using DL, the extraction of advanced data representations and …

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

Diffwave: A versatile diffusion model for audio synthesis

Z Kong, W Ping, J Huang, K Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org
In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional
and unconditional waveform generation. The model is non-autoregressive, and converts the …

Wavegrad: Estimating gradients for waveform generation

N Chen, Y Zhang, H Zen, RJ Weiss, M Norouzi… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper introduces WaveGrad, a conditional model for waveform generation which
estimates gradients of the data density. The model is built on prior work on score matching …

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arXiv preprint arXiv …, 2020 - assets.pubpub.org
We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

Synthetic Data--what, why and how?

J Jordon, L Szpruch, F Houssiau, M Bottarelli… - arXiv preprint arXiv …, 2022 - arxiv.org
This explainer document aims to provide an overview of the current state of the rapidly
expanding work on synthetic data technologies, with a particular focus on privacy. The …

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org
This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org
Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

Transfer learning from speaker verification to multispeaker text-to-speech synthesis

Y Jia, Y Zhang, R Weiss, Q Wang… - Advances in neural …, 2018 - proceedings.neurips.cc
We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to
generate speech audio in the voice of many different speakers, including those unseen …