Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Muchomusic: Evaluating music understanding in multimodal audio-language models

B Weck, I Manco, E Benetos, E Quinton… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal models that jointly process audio and language hold great promise in audio
understanding and are increasingly being adopted in the music domain. By allowing users …

Vidmuse: A simple video-to-music generation framework with long-short-term modeling

Z Tian, Z Liu, R Yuan, J Pan, Q Liu, X Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we systematically study music generation conditioned solely on the video. First,
we present a large-scale dataset comprising 360K video-music pairs, including various …

Mupt: A generative symbolic music pretrained transformer

X Qu, Y Bai, Y Ma, Z Zhou, KM Lo, J Liu, R Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we explore the application of Large Language Models (LLMs) to the pre-
training of music. While the prevalent use of MIDI in music modeling is well-established, our …

Train and Constrain: Phonologically Informed Tongue Twister Generation from Topics and Paraphrases

T Loakman, C Tang, C Lin - Computational Linguistics, 2024 - direct.mit.edu
Previous work in phonologically and phonetically grounded language generation has
mainly focused on domains such as puns and poetry. In this article, we present new work on …

The music maestro or the musically challenged, a massive music evaluation benchmark for large language models

J Li, L Yang, M Tang, C Chen, Z Li, P Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Benchmark plays a pivotal role in assessing the advancements of large language models
(LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities …

Cocola: Coherence-oriented contrastive learning of musical audio representations

R Ciranni, G Mariani, M Mancusi, E Postolache… - arXiv preprint arXiv …, 2024 - arxiv.org
We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive
learning method for musical audio representations that captures the harmonic and rhythmic …

Map-neo: Highly capable and transparent bilingual large language model series

G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have made great strides in recent years to achieve
unprecedented performance across different tasks. However, due to commercial interest, the …

A survey of foundation models for music understanding

W Li, Y Cai, Z Wu, W Zhang, Y Chen, R Qi… - arXiv preprint arXiv …, 2024 - arxiv.org
Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting
us personally, socially, and culturally. A better understanding of music can enhance our …

Melody is all you need for music generation

S Wei, M Wei, H Wang, Y Zhao, G Kou - arXiv preprint arXiv:2409.20196, 2024 - arxiv.org
We present the Melody Guided Music Generation (MMGen) model, the first novel approach
using melody to guide the music generation that, despite a pretty simple method and …