Less can be more: Sound source localization with a classification model

K Sung-Bin, A Senocak, H Ha… - Proceedings of the …, 2023 - openaccess.thecvf.com

How does audio describe the world around us? In this paper, we propose a method for
generating an image of a scene from sound. Our method addresses the challenges of …

被引用次数：32 相关文章所有 6 个版本

[PDF] thecvf.com

Sound source localization is all about cross-modal alignment

A Senocak, H Ryu, J Kim, TH Oh… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …

被引用次数：15 相关文章所有 8 个版本

Audio-visual segmentation by exploring cross-modal mutual semantics

C Liu, PP Li, X Qi, H Zhang, L Li, D Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

The audio-visual segmentation (AVS) task aims to segment sounding objects from a given
video. Existing works mainly focus on fusing audio and visual features of a given video to …

被引用次数：19 相关文章所有 3 个版本

[PDF] thecvf.com

Ave-clip: Audioclip-based multi-window temporal transformer for audio visual event localization

T Mahmud, D Marculescu - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

An audio-visual event (AVE) is denoted by the correspondence of the visual and auditory
signals in a video segment. Precise localization of the AVEs is very challenging since it …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

BAVS: bootstrapping audio-visual segmentation by integrating foundation knowledge

C Liu, P Li, H Zhang, L Li, Z Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Given an audio-visual pair, audio-visual segmentation (AVS) aims to locate sounding
sources by predicting pixel-wise maps. Previous methods assume that each sound …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Can CLIP Help Sound Source Localization?

S Park, A Senocak, JS Chung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Large-scale pre-trained image-text models demonstrate remarkable versatility across
diverse tasks, benefiting from their robust representational capabilities and effective …

被引用次数：6 相关文章所有 5 个版本

[PDF] thecvf.com

Exploiting visual context semantics for sound source localization

X Zhou, D Zhou, D Hu, H Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Self-supervised sound source localization in unconstrained visual scenes is an important
task of audio-visual learning. In this paper, we propose a visual reasoning module to …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Vision+ x: A survey on multimodal learning in the light of data

Y Zhu, Y Wu, N Sebe, Y Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

We are perceiving and communicating with the world in a multisensory manner, where
different information sources are sophisticatedly processed and interpreted by separate …

被引用次数：14 相关文章所有 2 个版本

[PDF] thecvf.com

Cyclic Learning for Binaural Audio Generation and Localization

Z Li, B Zhao, Y Yuan - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Binaural audio is obtained by simulating the biological structure of human ears which plays
an important role in artificial immersive spaces. A promising approach is to utilize mono …

被引用次数：3 相关文章

[PDF] arxiv.org

MarginNCE: Robust sound localization with a negative margin

S Park, A Senocak, JS Chung - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

The goal of this work is to localize sound sources in visual scenes with a self-supervised
approach. Contrastive learning in the context of sound source localization leverages the …

被引用次数：14 相关文章所有 6 个版本