Sound to visual scene generation by audio-to-visual latent alignment

K Sung-Bin, A Senocak, H Ha… - Proceedings of the …, 2023 - openaccess.thecvf.com
How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …

Sound source localization is all about cross-modal alignment

A Senocak, H Ryu, J Kim, TH Oh… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …

Audio-visual segmentation by exploring cross-modal mutual semantics

C Liu, PP Li, X Qi, H Zhang, L Li, D Wang… - Proceedings of the 31st …, 2023 - dl.acm.org
The audio-visual segmentation (AVS) task aims to segment sounding objects from a given
video. Existing works mainly focus on fusing audio and visual features of a given video to …

Ave-clip: Audioclip-based multi-window temporal transformer for audio visual event localization

T Mahmud, D Marculescu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
An audio-visual event (AVE) is denoted by the correspondence of the visual and auditory
signals in a video segment. Precise localization of the AVEs is very challenging since it …

BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

Can CLIP Help Sound Source Localization?

S Park, A Senocak, JS Chung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Large-scale pre-trained image-text models demonstrate remarkable versatility across
diverse tasks, benefiting from their robust representational capabilities and effective …

Exploiting visual context semantics for sound source localization

X Zhou, D Zhou, D Hu, H Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised sound source localization in unconstrained visual scenes is an important
task of audio-visual learning. In this paper, we propose a visual reasoning module to …

Vision+ x: A survey on multimodal learning in the light of data

Y Zhu, Y Wu, N Sebe, Y Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
We are perceiving and communicating with the world in a multisensory manner, where
different information sources are sophisticatedly processed and interpreted by separate …

Cyclic Learning for Binaural Audio Generation and Localization

Z Li, B Zhao, Y Yuan - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Binaural audio is obtained by simulating the biological structure of human ears which plays
an important role in artificial immersive spaces. A promising approach is to utilize mono …

MarginNCE: Robust sound localization with a negative margin

S Park, A Senocak, JS Chung - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
The goal of this work is to localize sound sources in visual scenes with a self-supervised
approach. Contrastive learning in the context of sound source localization leverages the …