Foundation models for music: A survey
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …
Muchomusic: Evaluating music understanding in multimodal audio-language models
Multimodal models that jointly process audio and language hold great promise in audio
understanding and are increasingly being adopted in the music domain. By allowing users …
understanding and are increasingly being adopted in the music domain. By allowing users …
Vidmuse: A simple video-to-music generation framework with long-short-term modeling
In this work, we systematically study music generation conditioned solely on the video. First,
we present a large-scale dataset comprising 360K video-music pairs, including various …
we present a large-scale dataset comprising 360K video-music pairs, including various …
Mupt: A generative symbolic music pretrained transformer
In this paper, we explore the application of Large Language Models (LLMs) to the pre-
training of music. While the prevalent use of MIDI in music modeling is well-established, our …
training of music. While the prevalent use of MIDI in music modeling is well-established, our …
Train and Constrain: Phonologically Informed Tongue Twister Generation from Topics and Paraphrases
Previous work in phonologically and phonetically grounded language generation has
mainly focused on domains such as puns and poetry. In this article, we present new work on …
mainly focused on domains such as puns and poetry. In this article, we present new work on …
The music maestro or the musically challenged, a massive music evaluation benchmark for large language models
Benchmark plays a pivotal role in assessing the advancements of large language models
(LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities …
(LLMs). While numerous benchmarks have been proposed to evaluate LLMs' capabilities …
Cocola: Coherence-oriented contrastive learning of musical audio representations
We present COCOLA (Coherence-Oriented Contrastive Learning for Audio), a contrastive
learning method for musical audio representations that captures the harmonic and rhythmic …
learning method for musical audio representations that captures the harmonic and rhythmic …
Map-neo: Highly capable and transparent bilingual large language model series
Large Language Models (LLMs) have made great strides in recent years to achieve
unprecedented performance across different tasks. However, due to commercial interest, the …
unprecedented performance across different tasks. However, due to commercial interest, the …
A survey of foundation models for music understanding
W Li, Y Cai, Z Wu, W Zhang, Y Chen, R Qi… - arXiv preprint arXiv …, 2024 - arxiv.org
Music is essential in daily life, fulfilling emotional and entertainment needs, and connecting
us personally, socially, and culturally. A better understanding of music can enhance our …
us personally, socially, and culturally. A better understanding of music can enhance our …
Melody is all you need for music generation
We present the Melody Guided Music Generation (MMGen) model, the first novel approach
using melody to guide the music generation that, despite a pretty simple method and …
using melody to guide the music generation that, despite a pretty simple method and …