A comprehensive survey on deep music generation: Multi-level representations, algorithms, evaluations, and future directions

S Ji, J Luo, X Yang - arXiv preprint arXiv:2011.06801, 2020 - arxiv.org
The utilization of deep learning techniques in generating various contents (such as image,
text, etc.) has become a trend. Especially music, the topic of this paper, has attracted …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Musicbert: Symbolic music understanding with large-scale pre-training

M Zeng, X Tan, R Wang, Z Ju, T Qin, TY Liu - arXiv preprint arXiv …, 2021 - arxiv.org
Symbolic music understanding, which refers to the understanding of music from the symbolic
data (eg, MIDI format, but not audio), covers many music applications such as genre …

MT3: Multi-task multitrack music transcription

J Gardner, I Simon, E Manilow, C Hawthorne… - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic Music Transcription (AMT), inferring musical notes from raw audio, is a
challenging task at the core of music understanding. Unlike Automatic Speech Recognition …

EMOPIA: A multi-modal pop piano dataset for emotion recognition and emotion-based music generation

HT Hung, J Ching, S Doh, N Kim, J Nam… - arXiv preprint arXiv …, 2021 - arxiv.org
While there are many music datasets with emotion labels in the literature, they cannot be
used for research on symbolic-domain music analysis or generation, as there are usually …

Sequence-to-sequence piano transcription with transformers

C Hawthorne, I Simon, R Swavely, E Manilow… - arXiv preprint arXiv …, 2021 - arxiv.org
Automatic Music Transcription has seen significant progress in recent years by training
custom deep neural networks on large datasets. However, these models have required …

Giantmidi-piano: A large-scale midi dataset for classical piano music

Q Kong, B Li, J Chen, Y Wang - arXiv preprint arXiv:2010.07061, 2020 - arxiv.org
Symbolic music datasets are important for music information retrieval and musical analysis.
However, there is a lack of large-scale symbolic datasets for classical piano music. In this …

MidiBERT-piano: large-scale pre-training for symbolic music understanding

YH Chou, I Chen, CJ Chang, J Ching… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper presents an attempt to employ the mask language modeling approach of BERT
to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files …

Automatic piano transcription with hierarchical frequency-time transformer

K Toyama, T Akama, Y Ikemiya, Y Takida… - arXiv preprint arXiv …, 2023 - arxiv.org
Taking long-term spectral and temporal dependencies into account is essential for automatic
piano transcription. This is especially helpful when determining the precise onset and offset …

Clamp: Contrastive language-music pre-training for cross-modal symbolic music information retrieval

S Wu, D Yu, X Tan, M Sun - arXiv preprint arXiv:2304.11029, 2023 - arxiv.org
We introduce CLaMP: Contrastive Language-Music Pre-training, which learns cross-modal
representations between natural language and symbolic music using a music encoder and …