Hear the flow: Optical flow-based self-supervised visual sound source localization

A Senocak, H Ryu, J Kim, TH Oh… - Proceedings of the …, 2023 - openaccess.thecvf.com

Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …

被引用次数：15 相关文章所有 8 个版本

[PDF] arxiv.org

Meerkat: Audio-visual large language model for grounding in space and time

S Chowdhury, S Nag, S Dasgupta, J Chen… - … on Computer Vision, 2025 - Springer

Abstract Leveraging Large Language Models' remarkable proficiency in text-based tasks,
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …

被引用次数：5 相关文章所有 11 个版本

[PDF] thecvf.com

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

被引用次数：5 相关文章所有 4 个版本

[PDF] thecvf.com

Can CLIP Help Sound Source Localization?

S Park, A Senocak, JS Chung - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Large-scale pre-trained image-text models demonstrate remarkable versatility across
diverse tasks, benefiting from their robust representational capabilities and effective …

被引用次数：7 相关文章所有 5 个版本

Audio–Visual Segmentation based on robust principal component analysis

S Fang, Q Zhu, Q Wu, S Wu, S Xie - Expert Systems with Applications, 2024 - Elsevier

Abstract Audio–Visual Segmentation (AVS) aims to extract the sounding objects from a
video. The current learning-based AVS methods are often supervised, which rely on specific …

被引用次数：1 相关文章所有 2 个版本

[PDF] ieee.org

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org

The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Audio-visual spatial integration and recursive attention for robust sound source localization

SJ Um, D Kim, JU Kim - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

The objective of the sound source localization task is to enable machines to detect the
location of sound-making objects within a visual scene. While the audio modality provides …

被引用次数：3 相关文章所有 4 个版本

[PDF] thecvf.com

Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge

D Kim, SJ Um, S Lee, JU Kim - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The goal of the multi-sound source localization task is to localize sound sources from the
mixture individually. While recent multi-sound source localization methods have shown …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment

A Senocak, H Ryu, J Kim, TH Oh, H Pfister… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent studies on learning-based sound source localization have mainly focused on the
localization performance perspective. However, prior work and existing benchmarks …

被引用次数：1 相关文章所有 2 个版本

Robust Audio-Visual Contrastive Learning for Proposal-based Self-supervised Sound Source Localization in Videos

H Xuan, Z Wu, J Yang, B Jiang, L Luo… - IEEE transactions on …, 2024 - ieeexplore.ieee.org

By observing a scene and listening to corresponding audio cues, humans can easily
recognize where the sound is. To achieve such cross-modal perception on machines …

被引用次数：4 相关文章所有 5 个版本