A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions
S Ji, J Luo, X Yang - arXiv preprint arXiv:2011.06801, 2020 - arxiv.org
The utilization of deep learning techniques in generating various contents (such as image,
text, etc.) has become a trend. Especially music, the topic of this paper, has attracted …
text, etc.) has become a trend. Especially music, the topic of this paper, has attracted …
A review of differentiable digital signal processing for music and speech synthesis
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …
loss function gradients are backpropagated through digital signal processors, facilitating …
Multi-instrument music synthesis with spectrogram diffusion
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …
Differentiable wavetable synthesis
Differentiable Wavetable Synthesis (DWTS) is a technique for neural audio synthesis which
learns a dictionary of one-period waveforms ie wavetables, through end-to-end training. We …
learns a dictionary of one-period waveforms ie wavetables, through end-to-end training. We …
Unsupervised cross-domain singing voice conversion
We present a wav-to-wav generative model for the task of singing voice conversion from any
identity. Our method utilizes both an acoustic model, trained for the task of automatic speech …
identity. Our method utilizes both an acoustic model, trained for the task of automatic speech …
Neural waveshaping synthesis
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal
approach to neural audio synthesis which operates directly in the waveform domain, with an …
approach to neural audio synthesis which operates directly in the waveform domain, with an …
Fastsvc: Fast cross-domain singing voice conversion with feature-wise linear modulation
This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC)
system, which can achieve high conversion performance, with inference speed 4x faster …
system, which can achieve high conversion performance, with inference speed 4x faster …
I'm sorry for your loss: Spectrally-based audio distances are bad at pitch
Growing research demonstrates that synthetic failure modes imply poor generalization. We
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …
A hierarchical speaker representation framework for one-shot singing voice conversion
Typically, singing voice conversion (SVC) depends on an embedding vector, extracted from
either a speaker lookup table (LUT) or a speaker recognition network (SRN), to model …
either a speaker lookup table (LUT) or a speaker recognition network (SRN), to model …
Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications
K Tatar, D Bisig, P Pasquier - Neural Computing and Applications, 2021 - Springer
Abstract We present the Latent Timbre Synthesis, a new audio synthesis method using deep
learning. The synthesis method allows composers and sound designers to interpolate and …
learning. The synthesis method allows composers and sound designers to interpolate and …