Priorgrad: Improving conditional denoising diffusion models with data-dependent adaptive prior

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：141 相关文章所有 6 个版本

[PDF] arxiv.org

Audioldm: Text-to-audio generation with latent diffusion models

H Liu, Z Chen, Y Yuan, X Mei, X Liu, D Mandic… - arXiv preprint arXiv …, 2023 - arxiv.org

Text-to-audio (TTA) system has recently gained attention for its ability to synthesize general
audio based on text descriptions. However, previous studies in TTA have limited generation …

被引用次数：378 相关文章所有 7 个版本

[PDF] arxiv.org

Diffsound: Discrete diffusion model for text-to-sound generation

D Yang, J Yu, H Wang, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org

Generating sound effects that people want is an important topic. However, there are limited
studies in this area for sound generation. In this study, we investigate generating sound …

被引用次数：244 相关文章所有 4 个版本

[PDF] thecvf.com

Deblurring via stochastic refinement

J Whang, M Delbracio, H Talebi… - Proceedings of the …, 2022 - openaccess.thecvf.com

Image deblurring is an ill-posed problem with multiple plausible solutions for a given input
image. However, most existing methods produce a deterministic estimate of the clean image …

被引用次数：232 相关文章所有 10 个版本

[PDF] arxiv.org

A survey on neural speech synthesis

X Tan, T Qin, F Soong, TY Liu - arXiv preprint arXiv:2106.15561, 2021 - arxiv.org

Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …

被引用次数：401 相关文章所有 2 个版本

[PDF] neurips.cc

Riemannian score-based generative modelling

V De Bortoli, E Mathieu, M Hutchinson… - Advances in …, 2022 - proceedings.neurips.cc

Score-based generative models (SGMs) are a powerful class of generative models that
exhibit remarkable empirical performance. Score-based generative modelling (SGM) …

被引用次数：157 相关文章所有 13 个版本

[PDF] arxiv.org

Bigvgan: A universal neural vocoder with large-scale training

S Lee, W Ping, B Ginsburg, B Catanzaro… - arXiv preprint arXiv …, 2022 - arxiv.org

Despite recent progress in generative adversarial network (GAN)-based vocoders, where
the model generates raw waveform conditioned on acoustic features, it is challenging to …

被引用次数：165 相关文章所有 5 个版本

[PDF] arxiv.org

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

被引用次数：74 相关文章所有 5 个版本

[PDF] thecvf.com

Generating visual scenes from touch

F Yang, J Zhang, A Owens - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

An emerging line of work has sought to generate plausible imagery from touch. Existing
approaches, however, tackle only narrow aspects of the visuo-tactile synthesis problem, and …

被引用次数：17 相关文章所有 6 个版本

[PDF] neurips.cc

Styletts 2: Towards human-level text-to-speech through style diffusion and adversarial training with large speech language models

YA Li, C Han, V Raghavan… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we present StyleTTS 2, a text-to-speech (TTS) model that leverages style
diffusion and adversarial training with large speech language models (SLMs) to achieve …

被引用次数：50 相关文章所有 6 个版本