A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions

S Ji, J Luo, X Yang - arXiv preprint arXiv:2011.06801, 2020 - arxiv.org
The utilization of deep learning techniques in generating various contents (such as image,
text, etc.) has become a trend. Especially music, the topic of this paper, has attracted …

A review of differentiable digital signal processing for music and speech synthesis

B Hayes, J Shier, G Fazekas, A McPherson… - Frontiers in Signal …, 2024 - frontiersin.org
The term “differentiable digital signal processing” describes a family of techniques in which
loss function gradients are backpropagated through digital signal processors, facilitating …

Multi-instrument music synthesis with spectrogram diffusion

C Hawthorne, I Simon, A Roberts, N Zeghidour… - arXiv preprint arXiv …, 2022 - arxiv.org
An ideal music synthesizer should be both interactive and expressive, generating high-
fidelity audio in realtime for arbitrary combinations of instruments and notes. Recent neural …

Differentiable wavetable synthesis

S Shan, L Hantrakul, J Chen, M Avent… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Differentiable Wavetable Synthesis (DWTS) is a technique for neural audio synthesis which
learns a dictionary of one-period waveforms ie wavetables, through end-to-end training. We …

Unsupervised cross-domain singing voice conversion

A Polyak, L Wolf, Y Adi, Y Taigman - arXiv preprint arXiv:2008.02830, 2020 - arxiv.org
We present a wav-to-wav generative model for the task of singing voice conversion from any
identity. Our method utilizes both an acoustic model, trained for the task of automatic speech …

Neural waveshaping synthesis

B Hayes, C Saitis, G Fazekas - arXiv preprint arXiv:2107.05050, 2021 - arxiv.org
We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal
approach to neural audio synthesis which operates directly in the waveform domain, with an …

Fastsvc: Fast cross-domain singing voice conversion with feature-wise linear modulation

S Liu, Y Cao, N Hu, D Su… - 2021 ieee international …, 2021 - ieeexplore.ieee.org
This paper presents FastSVC, a light-weight cross-domain singing voice conversion (SVC)
system, which can achieve high conversion performance, with inference speed 4x faster …

I'm sorry for your loss: Spectrally-based audio distances are bad at pitch

J Turian, M Henry - arXiv preprint arXiv:2012.04572, 2020 - arxiv.org
Growing research demonstrates that synthetic failure modes imply poor generalization. We
compare commonly used audio-to-audio losses on a synthetic benchmark, measuring the …

A hierarchical speaker representation framework for one-shot singing voice conversion

X Li, S Liu, Y Shan - arXiv preprint arXiv:2206.13762, 2022 - arxiv.org
Typically, singing voice conversion (SVC) depends on an embedding vector, extracted from
either a speaker lookup table (LUT) or a speaker recognition network (SRN), to model …

Latent timbre synthesis: Audio-based variational auto-encoders for music composition and sound design applications

K Tatar, D Bisig, P Pasquier - Neural Computing and Applications, 2021 - Springer
Abstract We present the Latent Timbre Synthesis, a new audio synthesis method using deep
learning. The synthesis method allows composers and sound designers to interpolate and …