Gotta go fast when generating data with score-based models

A Jolicoeur-Martineau, K Li, R Piché-Taillefer… - arXiv preprint arXiv …, 2021 - arxiv.org
Score-based (denoising diffusion) generative models have recently gained a lot of success
in generating realistic and diverse data. These approaches define a forward diffusion …

Byol for audio: Self-supervised learning for general-purpose audio representation

D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

Zero-shot voice conditioning for denoising diffusion tts models

A Levkovitch, E Nachmani, L Wolf - arXiv preprint arXiv:2206.02246, 2022 - arxiv.org
We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …

Fasts2s-vc: Streaming non-autoregressive sequence-to-sequence voice conversion

H Kameoka, K Tanaka, T Kaneko - arXiv preprint arXiv:2104.06900, 2021 - arxiv.org
This paper proposes a non-autoregressive extension of our previously proposed sequence-
to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC …

Diffusion models in text generation: a survey

Q Yi, X Chen, C Zhang, Z Zhou, L Zhu, X Kong - PeerJ Computer Science, 2024 - peerj.com
Diffusion models are a kind of math-based model that were first applied to image generation.
Recently, they have drawn wide interest in natural language generation (NLG), a sub-field of …

Generative probabilistic image colorization

C Furusawa, S Kitaoka, M Li, Y Odagiri - arXiv preprint arXiv:2109.14518, 2021 - arxiv.org
We propose Generative Probabilistic Image Colorization, a diffusion-based generative
process that trains a sequence of probabilistic models to reverse each step of noise …

Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning

IL Correa, S Ueno, A Lee - 2023 Asia Pacific Signal and …, 2023 - ieeexplore.ieee.org
The use of generated corrected accent speech with a learner's own voice holds potential for
second language learners to enhance their pronunciation through self-imitation exercises …

Non-parallel voice conversion based on free-energy minimization of speaker-conditional restricted Boltzmann machine

T Kishida, T Nakashika - 2022 Asia-Pacific Signal and …, 2022 - ieeexplore.ieee.org
In this paper, we propose a non-parallel voice conversion method based on the
minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed …

[PDF][PDF] Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver.

H Tachibana, M Inahara, M Go, Y Katayama… - …, 2022 - isca-archive.org
Diffusion generative models, which generate data by the timereverse dynamics of diffusion
processes, have attracted much attention recently, and have already been applied in the …