Creating a musical performance dataset for multimodal music analysis: Challenges, insights,...

L Chen, S Srivastava, Z Duan, C Xu - … of the on Thematic Workshops of …, 2017 - dl.acm.org

Cross-modal audio-visual perception has been a long-lasting topic in psychology and
neurology, and various studies have discovered strong correlations in human perception of …

被引用次数：223 相关文章所有 10 个版本

[PDF] aaai.org

Cmcgan: A uniform framework for cross-modal visual-audio mutual generation

W Hao, Z Zhang, H Guan - Proceedings of the AAAI conference on …, 2018 - ojs.aaai.org

Visual and audio modalities are two symbiotic modalities underlying videos, which contain
both common and complementary information. If they can be mined and fused sufficiently …

被引用次数：86 相关文章所有 7 个版本

[PDF] mlr.press

Unaligned supervision for automatic music transcription in the wild

B Maman, AH Bermano - International Conference on …, 2022 - proceedings.mlr.press

Abstract Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical
recording into semantic musical content, is one of the holy grails of Music Information …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

End-to-end sound source separation conditioned on instrument labels

O Slizovskaia, L Kim, G Haro… - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org

Can we perform an end-to-end music source separation with a variable number of sources
using a deep learning model? This paper presents an extension of the Wave-U-Net [1] …

被引用次数：39 相关文章所有 7 个版本

[PDF] researchgate.net

[PDF][PDF] Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings.

M Taenzer, J Abeßer, SI Mimilakis, C Weiß, M Müller… - ISMIR, 2019 - researchgate.net

Western classical music comprises a rich repertoire composed for different ensembles.
Often, these ensembles consist of instruments from one or two of the families woodwinds …

被引用次数：20 相关文章所有 3 个版本

[PDF] infocomm-journal.com

面向6G 的跨模态信号重建技术

李昂，陈建新，魏昕，周亮 - 通信学报, 2022 - infocomm-journal.com

6G 时代下, 为了兼顾多媒体用户音频, 视频, 触觉的沉浸式体验需求与低时延, 高可靠,
大容量的通信质量, 提出一种跨模态信号重建架构和由视频信号重建触觉信号的深度学习模型 …

被引用次数：3 相关文章所有 3 个版本

[PDF] rochester.edu

See and listen: Score-informed association of sound tracks to players in chamber music performance videos

B Li, K Dinesh, Z Duan… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Both audio and visual aspects of a musical performance, especially their association, are
important for expressing players' ideas and for engaging the audience. In this paper, we …

被引用次数：32 相关文章所有 11 个版本

[PDF] arxiv.org

Multimodal transformer for parallel concatenated variational autoencoders

SD Liang, JM Mendel - arXiv preprint arXiv:2210.16174, 2022 - arxiv.org

In this paper, we propose a multimodal transformer using parallel concatenated architecture.
Instead of using patches, we use column stripes for images in R, G, B channels as the …

被引用次数：2 相关文章所有 2 个版本

[PDF] rochester.edu

[PDF][PDF] Audiovisual source association for string ensembles through multi-modal vibrato analysis

B Li, C Xu, Z Duan - Proc. Sound and Music Computing (SMC …, 2017 - labsites.rochester.edu

With the proliferation of video content of musical performances, audio-visual analysis
becomes an emerging topic in music information retrieval. Associating the audio and visual …

被引用次数：18 相关文章所有 7 个版本

[PDF] telecom-paristech.fr

Guiding audio source separation by video object information

S Parekh, S Essid, A Ozerov… - … IEEE Workshop on …, 2017 - ieeexplore.ieee.org

In this work we propose novel joint and sequential multimodal approaches for the task of
single channel audio source separation in videos. This is done within the popular non …

被引用次数：16 相关文章所有 5 个版本