Sound to visual scene generation by audio-to-visual latent alignment
How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …
generating an image of a scene from sound. Our method addresses the challenges of …
Sound source localization is all about cross-modal alignment
Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …
source localization. Recent studies on learning-based sound source localization have …
Audio-visual segmentation by exploring cross-modal mutual semantics
The audio-visual segmentation (AVS) task aims to segment sounding objects from a given
video. Existing works mainly focus on fusing audio and visual features of a given video to …
video. Existing works mainly focus on fusing audio and visual features of a given video to …
Ave-clip: Audioclip-based multi-window temporal transformer for audio visual event localization
T Mahmud, D Marculescu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
An audio-visual event (AVE) is denoted by the correspondence of the visual and auditory
signals in a video segment. Precise localization of the AVEs is very challenging since it …
signals in a video segment. Precise localization of the AVEs is very challenging since it …
BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …
sources by predicting pixel-wise maps. Previous methods assume that each sound …
Can CLIP Help Sound Source Localization?
Large-scale pre-trained image-text models demonstrate remarkable versatility across
diverse tasks, benefiting from their robust representational capabilities and effective …
diverse tasks, benefiting from their robust representational capabilities and effective …
Exploiting visual context semantics for sound source localization
Self-supervised sound source localization in unconstrained visual scenes is an important
task of audio-visual learning. In this paper, we propose a visual reasoning module to …
task of audio-visual learning. In this paper, we propose a visual reasoning module to …
Vision+ x: A survey on multimodal learning in the light of data
We are perceiving and communicating with the world in a multisensory manner, where
different information sources are sophisticatedly processed and interpreted by separate …
different information sources are sophisticatedly processed and interpreted by separate …
Cyclic Learning for Binaural Audio Generation and Localization
Z Li, B Zhao, Y Yuan - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Binaural audio is obtained by simulating the biological structure of human ears which plays
an important role in artificial immersive spaces. A promising approach is to utilize mono …
an important role in artificial immersive spaces. A promising approach is to utilize mono …
MarginNCE: Robust sound localization with a negative margin
The goal of this work is to localize sound sources in visual scenes with a self-supervised
approach. Contrastive learning in the context of sound source localization leverages the …
approach. Contrastive learning in the context of sound source localization leverages the …