Deep cross-modal audio-visual generation
Cross-modal audio-visual perception has been a long-lasting topic in psychology and
neurology, and various studies have discovered strong correlations in human perception of …
neurology, and various studies have discovered strong correlations in human perception of …
Cmcgan: A uniform framework for cross-modal visual-audio mutual generation
Visual and audio modalities are two symbiotic modalities underlying videos, which contain
both common and complementary information. If they can be mined and fused sufficiently …
both common and complementary information. If they can be mined and fused sufficiently …
Unaligned supervision for automatic music transcription in the wild
B Maman, AH Bermano - International Conference on …, 2022 - proceedings.mlr.press
Abstract Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical
recording into semantic musical content, is one of the holy grails of Music Information …
recording into semantic musical content, is one of the holy grails of Music Information …
End-to-end sound source separation conditioned on instrument labels
Can we perform an end-to-end music source separation with a variable number of sources
using a deep learning model? This paper presents an extension of the Wave-U-Net [1] …
using a deep learning model? This paper presents an extension of the Wave-U-Net [1] …
[PDF][PDF] Investigating CNN-based Instrument Family Recognition for Western Classical Music Recordings.
Western classical music comprises a rich repertoire composed for different ensembles.
Often, these ensembles consist of instruments from one or two of the families woodwinds …
Often, these ensembles consist of instruments from one or two of the families woodwinds …
面向6G 的跨模态信号重建技术
李昂, 陈建新, 魏昕, 周亮 - 通信学报, 2022 - infocomm-journal.com
6G 时代下, 为了兼顾多媒体用户音频, 视频, 触觉的沉浸式体验需求与低时延, 高可靠,
大容量的通信质量, 提出一种跨模态信号重建架构和由视频信号重建触觉信号的深度学习模型 …
大容量的通信质量, 提出一种跨模态信号重建架构和由视频信号重建触觉信号的深度学习模型 …
See and listen: Score-informed association of sound tracks to players in chamber music performance videos
Both audio and visual aspects of a musical performance, especially their association, are
important for expressing players' ideas and for engaging the audience. In this paper, we …
important for expressing players' ideas and for engaging the audience. In this paper, we …
Multimodal transformer for parallel concatenated variational autoencoders
SD Liang, JM Mendel - arXiv preprint arXiv:2210.16174, 2022 - arxiv.org
In this paper, we propose a multimodal transformer using parallel concatenated architecture.
Instead of using patches, we use column stripes for images in R, G, B channels as the …
Instead of using patches, we use column stripes for images in R, G, B channels as the …
[PDF][PDF] Audiovisual source association for string ensembles through multi-modal vibrato analysis
With the proliferation of video content of musical performances, audio-visual analysis
becomes an emerging topic in music information retrieval. Associating the audio and visual …
becomes an emerging topic in music information retrieval. Associating the audio and visual …
Guiding audio source separation by video object information
In this work we propose novel joint and sequential multimodal approaches for the task of
single channel audio source separation in videos. This is done within the popular non …
single channel audio source separation in videos. This is done within the popular non …