Gotta go fast when generating data with score-based models
Score-based (denoising diffusion) generative models have recently gained a lot of success
in generating realistic and diverse data. These approaches define a forward diffusion …
in generating realistic and diverse data. These approaches define a forward diffusion …
Byol for audio: Self-supervised learning for general-purpose audio representation
D Niizumi, D Takeuchi, Y Ohishi… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Inspired by the recent progress in self-supervised learning for computer vision that
generates supervision using data augmentations, we explore a new general-purpose audio …
generates supervision using data augmentations, we explore a new general-purpose audio …
Multi-instrument music synthesis with spectrogram diffusion
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …
Zero-shot voice conditioning for denoising diffusion tts models
We present a novel way of conditioning a pretrained denoising diffusion speech model to
produce speech in the voice of a novel person unseen during training. The method requires …
produce speech in the voice of a novel person unseen during training. The method requires …
Fasts2s-vc: Streaming non-autoregressive sequence-to-sequence voice conversion
This paper proposes a non-autoregressive extension of our previously proposed sequence-
to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC …
to-sequence (S2S) model-based voice conversion (VC) methods. S2S model-based VC …
Diffusion models in text generation: a survey
Diffusion models are a kind of math-based model that were first applied to image generation.
Recently, they have drawn wide interest in natural language generation (NLG), a sub-field of …
Recently, they have drawn wide interest in natural language generation (NLG), a sub-field of …
Generative probabilistic image colorization
C Furusawa, S Kitaoka, M Li, Y Odagiri - arXiv preprint arXiv:2109.14518, 2021 - arxiv.org
We propose Generative Probabilistic Image Colorization, a diffusion-based generative
process that trains a sequence of probabilistic models to reverse each step of noise …
process that trains a sequence of probabilistic models to reverse each step of noise …
Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning
IL Correa, S Ueno, A Lee - 2023 Asia Pacific Signal and …, 2023 - ieeexplore.ieee.org
The use of generated corrected accent speech with a learner's own voice holds potential for
second language learners to enhance their pronunciation through self-imitation exercises …
second language learners to enhance their pronunciation through self-imitation exercises …
Non-parallel voice conversion based on free-energy minimization of speaker-conditional restricted Boltzmann machine
T Kishida, T Nakashika - 2022 Asia-Pacific Signal and …, 2022 - ieeexplore.ieee.org
In this paper, we propose a non-parallel voice conversion method based on the
minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed …
minimization of the free energy of a restricted Boltzmann machine (RBM). The proposed …
[PDF][PDF] Diffusion Generative Vocoder for Fullband Speech Synthesis Based on Weak Third-order SDE Solver.
Diffusion generative models, which generate data by the timereverse dynamics of diffusion
processes, have attracted much attention recently, and have already been applied in the …
processes, have attracted much attention recently, and have already been applied in the …