Multimodal classification: Current landscape, taxonomy and future directions

WC Sleeman IV, R Kapoor, P Ghosh - ACM Computing Surveys, 2022 - dl.acm.org
Multimodal classification research has been gaining popularity with new datasets in
domains such as satellite imagery, biometrics, and medicine. Prior research has shown the …

What all do audio transformer models hear? probing acoustic representations for language delivery and its structure

J Shah, YK Singla, C Chen, RR Shah - arXiv preprint arXiv:2101.00387, 2021 - arxiv.org
In recent times, BERT based transformer models have become an inseparable part of
the'tech stack'of text processing models. Similar progress is being observed in the speech …

What do audio transformers hear? probing their representations for language delivery & structure

YK Singla, J Shah, C Chen… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Transformer models across multiple domains such as natural language processing and
speech form an unavoidable part of the tech stack of practitioners and researchers alike. Au …

Discover: Disentangled music representation learning for cover song identification

J Xun, S Zhang, Y Yang, J Zhu, L Deng… - Proceedings of the 46th …, 2023 - dl.acm.org
In the field of music information retrieval (MIR), cover song identification (CSI) is a
challenging task that aims to identify cover versions of a query song from a massive …

Virtual Instrument Performances (VIP): A Comprehensive Review

T Kyriakou, MÁ de la Campa Crespo… - Computer Graphics …, 2024 - Wiley Online Library
Driven by recent advancements in Extended Reality (XR), the hype around the Metaverse,
and real‐time computer graphics, the transformation of the performing arts, particularly in …

[HTML][HTML] Late multimodal fusion for image and audio music transcription

M Alfaro-Contreras, JJ Valero-Mas, JM Iñesta… - Expert Systems with …, 2023 - Elsevier
Music transcription, which deals with the conversion of music sources into a structured
digital format, is a key problem for Music Information Retrieval (MIR). When addressing this …

Multimodal music datasets? Challenges and future goals in music processing

AM Christodoulou, O Lartillot, AR Jensenius - International Journal of …, 2024 - Springer
The term “multimodal music dataset” is often used to describe music-related datasets that
represent music as a multimedia art form and multimodal experience. However, the term …

Multimodal representation learning over heterogeneous networks for tag-based music retrieval

ACM da Silva, DF Silva, RM Marcacini - Expert Systems with Applications, 2022 - Elsevier
Learning how to represent data represented by features obtained from multiple modalities
through representation learning strategies has received much attention in Music Information …

A computational lens into how music characterizes genre in film

B Ma, T Greer, D Knox, S Narayanan - PloS one, 2021 - journals.plos.org
Film music varies tremendously across genre in order to bring about different responses in
an audience. For instance, composers may evoke passion in a romantic scene with lush …

Ccom-huqin: An annotated multimodal chinese fiddle performance dataset

Y Zhang, Z Zhou, X Li, F Yu, M Sun - arXiv preprint arXiv:2209.06496, 2022 - arxiv.org
HuQin is a family of traditional Chinese bowed string instruments. Playing techniques (PTs)
embodied in various playing styles add abundant emotional coloring and aesthetic feelings …