Voicegrad: Non-parallel any-to-many voice conversion with annealed langevin dynamics

A Jolicoeur-Martineau, K Li, R Piché-Taillefer… - arXiv preprint arXiv …, 2021 - arxiv.org

Score-based (denoising diffusion) generative models have recently gained a lot of success
in generating realistic and diverse data. These approaches define a forward diffusion …

被引用次数：221 相关文章所有 3 个版本

[PDF] arxiv.org

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org

Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …

被引用次数：196 相关文章所有 5 个版本

[PDF] arxiv.org

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org

An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

被引用次数：64 相关文章所有 4 个版本

[PDF] arxiv.org

Zero-shot voice conditioning for denoising diffusion tts models

A Levkovitch, E Nachmani, L Wolf - arXiv preprint arXiv:2206.02246, 2022 - arxiv.org

We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …

被引用次数：25 相关文章所有 5 个版本

[PDF] arxiv.org

Fasts2s-vc: Streaming non-autoregressive sequence-to-sequence voice conversion

H Kameoka, K Tanaka, T Kaneko - arXiv preprint arXiv:2104.06900, 2021 - arxiv.org

This paper proposes a non-autoregressive extension of our previously proposed sequence-
to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC …

被引用次数：23 相关文章所有 2 个版本

[PDF] peerj.com

Diffusion models in text generation: a survey

Q Yi, X Chen, C Zhang, Z Zhou, L Zhu, X Kong - PeerJ Computer Science, 2024 - peerj.com

Diffusion models are a kind of math-based model that were first applied to image generation.
Recently, they have drawn wide interest in natural language generation (NLG), a sub-field of …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

Generative probabilistic image colorization

C Furusawa, S Kitaoka, M Li, Y Odagiri - arXiv preprint arXiv:2109.14518, 2021 - arxiv.org

We propose Generative Probabilistic Image Colorization, a diffusion-based generative
process that trains a sequence of probabilistic models to reverse each step of noise …

被引用次数：6 相关文章所有 3 个版本

Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning

IL Correa, S Ueno, A Lee - 2023 Asia Pacific Signal and …, 2023 - ieeexplore.ieee.org

The use of generated corrected accent speech with a learner's own voice holds potential for
second language learners to enhance their pronunciation through self-imitation exercises …

[PDF] apsipa.org

Non-parallel voice conversion based on free-energy minimization of speaker-conditional restricted Boltzmann machine

T Kishida, T Nakashika - 2022 Asia-Pacific Signal and …, 2022 - ieeexplore.ieee.org

In this paper, we propose a non-parallel voice conversion method based on the
minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed …

被引用次数：1 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver.

H Tachibana, M Inahara, M Go, Y Katayama… - …, 2022 - isca-archive.org

Diffusion generative models, which generate data by the timereverse dynamics of diffusion
processes, have attracted much attention recently, and have already been applied in the …