Multimodal music information processing and retrieval: Survey and future challenges

WC Sleeman IV, R Kapoor, P Ghosh - ACM Computing Surveys, 2022 - dl.acm.org

Multimodal classification research has been gaining popularity with new datasets in
domains such as satellite imagery, biometrics, and medicine. Prior research has shown the …

被引用次数：79 相关文章所有 4 个版本

[PDF] arxiv.org

What all do audio transformer models hear? probing acoustic representations for language delivery and its structure

J Shah, YK Singla, C Chen, RR Shah - arXiv preprint arXiv:2101.00387, 2021 - arxiv.org

In recent times, BERT based transformer models have become an inseparable part of
the'tech stack'of text processing models. Similar progress is being observed in the speech …

被引用次数：85 相关文章所有 7 个版本

[PDF] nsf.gov

What do audio transformers hear? probing their representations for language delivery & structure

YK Singla, J Shah, C Chen… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transformer models across multiple domains such as natural language processing and
speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Discover: Disentangled music representation learning for cover song identification

J Xun, S Zhang, Y Yang, J Zhu, L Deng… - Proceedings of the 46th …, 2023 - dl.acm.org

In the field of music information retrieval (MIR), cover song identification (CSI) is a
challenging task that aims to identify cover versions of a query song from a massive …

被引用次数：3 相关文章所有 4 个版本

[PDF] wiley.com

Virtual Instrument Performances (VIP): A Comprehensive Review

T Kyriakou, MÁ de la Campa Crespo… - Computer Graphics …, 2024 - Wiley Online Library

Driven by recent advancements in Extended Reality (XR), the hype around the Metaverse,
and real‐time computer graphics, the transformation of the performing arts, particularly in …

被引用次数：3 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Late multimodal fusion for image and audio music transcription

M Alfaro-Contreras, JJ Valero-Mas, JM Iñesta… - Expert Systems with …, 2023 - Elsevier

Music transcription, which deals with the conversion of music sources into a structured
digital format, is a key problem for Music Information Retrieval (MIR). When addressing this …

被引用次数：15 相关文章所有 8 个版本

[PDF] springer.com

Multimodal music datasets? Challenges and future goals in music processing

AM Christodoulou, O Lartillot, AR Jensenius - International Journal of …, 2024 - Springer

The term “multimodal music dataset” is often used to describe music-related datasets that
represent music as a multimedia art form and multimodal experience. However, the term …

被引用次数：2 相关文章所有 4 个版本

Multimodal representation learning over heterogeneous networks for tag-based music retrieval

ACM da Silva, DF Silva, RM Marcacini - Expert Systems with Applications, 2022 - Elsevier

Learning how to represent data represented by features obtained from multiple modalities
through representation learning strategies has received much attention in Music Information …

被引用次数：7 相关文章所有 3 个版本

[PDF] plos.org

A computational lens into how music characterizes genre in film

B Ma, T Greer, D Knox, S Narayanan - PloS one, 2021 - journals.plos.org

Film music varies tremendously across genre in order to bring about different responses in
an audience. For instance, composers may evoke passion in a romantic scene with lush …

被引用次数：16 相关文章所有 13 个版本

[PDF] arxiv.org

Ccom-huqin: An annotated multimodal chinese fiddle performance dataset

Y Zhang, Z Zhou, X Li, F Yu, M Sun - arXiv preprint arXiv:2209.06496, 2022 - arxiv.org

HuQin is a family of traditional Chinese bowed string instruments. Playing techniques (PTs)
embodied in various playing styles add abundant emotional coloring and aesthetic feelings …

被引用次数：7 相关文章所有 4 个版本