- 学术资源搜索

Deep learning: Applications, architectures, models, tools, and frameworks: A comprehensive survey

M Gheisari, F Ebrahimzadeh, M Rahimi… - CAAI Transactions …, 2023 - Wiley Online Library

Deep Learning (DL) is a subfield of machine learning that significantly impacts extracting
new knowledge. By using DL, the extraction of advanced data representations and …

被引用次数：102 相关文章所有 5 个版本

[PDF] arxiv.org

Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward

M Masood, M Nawaz, KM Malik, A Javed, A Irtaza… - Applied …, 2023 - Springer

Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …

被引用次数：338 相关文章所有 11 个版本

[PDF] arxiv.org

Diffwave: A versatile diffusion model for audio synthesis

Z Kong, W Ping, J Huang, K Zhao… - arXiv preprint arXiv …, 2020 - arxiv.org

In this work, we propose DiffWave, a versatile diffusion probabilistic model for conditional
and unconditional waveform generation. The model is non-autoregressive, and converts the …

被引用次数：1249 相关文章所有 3 个版本

[PDF] arxiv.org

Wavegrad: Estimating gradients for waveform generation

N Chen, Y Zhang, H Zen, RJ Weiss, M Norouzi… - arXiv preprint arXiv …, 2020 - arxiv.org

This paper introduces WaveGrad, a conditional model for waveform generation which
estimates gradients of the data density. The model is built on prior work on score matching …

被引用次数：755 相关文章所有 6 个版本

[PDF] pubpub.org

[PDF][PDF] Jukebox: A generative model for music

P Dhariwal, H Jun, C Payne, JW Kim… - arXiv preprint arXiv …, 2020 - assets.pubpub.org

We introduce Jukebox, a model that generates music with singing in the raw audio domain.
We tackle the long context of raw audio using a multiscale VQ-VAE to compress it to discrete …

被引用次数：797 相关文章所有 8 个版本

[PDF] arxiv.org

Synthetic Data--what, why and how?

J Jordon, L Szpruch, F Houssiau, M Bottarelli… - arXiv preprint arXiv …, 2022 - arxiv.org

This explainer document aims to provide an overview of the current state of the rapidly
expanding work on synthetic data technologies, with a particular focus on privacy. The …

被引用次数：156 相关文章所有 5 个版本

[PDF] arxiv.org

Libritts: A corpus derived from librispeech for text-to-speech

H Zen, V Dang, R Clark, Y Zhang, RJ Weiss… - arXiv preprint arXiv …, 2019 - arxiv.org

This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …

被引用次数：905 相关文章所有 10 个版本

[PDF] aaai.org

Neural speech synthesis with transformer network

N Li, S Liu, Y Liu, S Zhao, M Liu - … of the AAAI conference on artificial …, 2019 - ojs.aaai.org

Although end-to-end neural text-to-speech (TTS) methods (such as Tacotron2) are proposed
and achieve state-of-theart performance, they still suffer from two problems: 1) low efficiency …

被引用次数：855 相关文章所有 10 个版本

[PDF] sigport.org

Natural tts synthesis by conditioning wavenet on mel spectrogram predictions

J Shen, R Pang, RJ Weiss, M Schuster… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

This paper describes Tacotron 2, a neural network architecture for speech synthesis directly
from text. The system is composed of a recurrent sequence-to-sequence feature prediction …

被引用次数：3266 相关文章所有 8 个版本

[PDF] neurips.cc

Transfer learning from speaker verification to multispeaker text-to-speech synthesis

Y Jia, Y Zhang, R Weiss, Q Wang… - Advances in neural …, 2018 - proceedings.neurips.cc

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to
generate speech audio in the voice of many different speakers, including those unseen …

被引用次数：976 相关文章所有 8 个版本