Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Learning to separate object sounds by watching unlabeled video

R Gao, R Feris, K Grauman - Proceedings of the European …, 2018 - openaccess.thecvf.com
Perceiving a scene most fully requires all the senses. Yet modeling how objects look and
sound is challenging: most natural scenes and events contain multiple objects, and the …

2.5 d visual sound

R Gao, K Grauman - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Binaural audio provides a listener with 3D sound sensation, allowing a rich perceptual
experience of the scene. However, binaural recordings are scarcely available and require …

Co-separating sounds of visual objects

R Gao, K Grauman - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
Learning how objects sound from video is challenging, since they often heavily overlap in a
single audio channel. Current methods for visually-guided audio source separation sidestep …

Deep cross-modal audio-visual generation

L Chen, S Srivastava, Z Duan, C Xu - … of the on Thematic Workshops of …, 2017 - dl.acm.org
Cross-modal audio-visual perception has been a long-lasting topic in psychology and
neurology, and various studies have discovered strong correlations in human perception of …

Creating a multitrack classical music performance dataset for multimodal music analysis: Challenges, insights, and applications

B Li, X Liu, K Dinesh, Z Duan… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
We introduce a dataset for facilitating audio-visual analysis of music performances. The
dataset comprises 44 simple multi-instrument classical music pieces assembled from …

Move2hear: Active audio-visual source separation

S Majumder, Z Al-Halah… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We introduce the active audio-visual source separation problem, where an agent must move
intelligently in order to better isolate the sounds coming from an object of interest in its …

Multimodal music information processing and retrieval: Survey and future challenges

F Simonetta, S Ntalampiras… - … workshop on multilayer …, 2019 - ieeexplore.ieee.org
Towards improving the performance in various music information processing tasks, recent
studies exploit different modalities able to capture diverse aspects of music. Such modalities …

Audiovisual analysis of music performances: Overview of an emerging field

Z Duan, S Essid, CCS Liem, G Richard… - IEEE Signal …, 2018 - ieeexplore.ieee.org
In the physical sciences and engineering domains, music has traditionally been considered
an acoustic phenomenon. From a perceptual viewpoint, music is naturally associated with …

Active audio-visual separation of dynamic sound sources

S Majumder, K Grauman - European Conference on Computer Vision, 2022 - Springer
We explore active audio-visual separation for dynamic sound sources, where an embodied
agent moves intelligently in a 3D environment to continuously isolate the time-varying audio …