The song describer dataset: a corpus of audio captions for music-and-language evaluation

I Manco, B Weck, S Doh, M Won, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality
audio-caption pairs, designed for the evaluation of music-and-language models. The …

Lp-musiccaps: Llm-based pseudo music captioning

SH Doh, K Choi, J Lee, J Nam - arXiv preprint arXiv:2307.16372, 2023 - arxiv.org
Automatic music captioning, which generates natural language descriptions for given music
tracks, holds significant potential for enhancing the understanding and organization of large …

Musechat: A conversational music recommendation system for videos

Z Dong, X Liu, B Chen, P Polak… - Proceedings of the …, 2024 - openaccess.thecvf.com
Music recommendation for videos attracts growing interest in multi-modal research.
However existing systems focus primarily on content compatibility often ignoring the users' …

Wikimute: A web-sourced dataset of semantic descriptions for music audio

B Weck, H Kirchhoff, P Grosche, X Serra - International Conference on …, 2024 - Springer
Multi-modal deep learning techniques for matching free-form text with music have shown
promising results in the field of Music Information Retrieval (MIR). Prior work is often based …

[PDF][PDF] Annotator Subjectivity in the MusicCaps Dataset.

M Lee, S Doh, D Jeong - HCMIR@ ISMIR, 2023 - ceur-ws.org
Musical caption, when expressed in free-form text as opposed to more structured and limited
musical tags, often encompasses the individual characteristics of the annotator, thereby …

Zero-Shot Structure Labeling with Audio And Language Model Embeddings

M Buisson, C Ick, T Xi, B McFee - … Abstracts for the Late-Breaking Demo …, 2024 - hal.science
Recent progress on audio-based music structure analysis has closely aligned with the
appearance of new deep learning paradigms, notably for the extraction of robust spectro …