Sound source localization is all about cross-modal alignment
Humans can easily perceive the direction of sound sources in a visual scene, termed sound
source localization. Recent studies on learning-based sound source localization have …
source localization. Recent studies on learning-based sound source localization have …
Meerkat: Audio-visual large language model for grounding in space and time
Abstract Leveraging Large Language Models' remarkable proficiency in text-based tasks,
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …
recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and …
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …
Although great progress has been witnessed we experimentally reveal that current methods …
Can CLIP Help Sound Source Localization?
Large-scale pre-trained image-text models demonstrate remarkable versatility across
diverse tasks, benefiting from their robust representational capabilities and effective …
diverse tasks, benefiting from their robust representational capabilities and effective …
Audio–Visual Segmentation based on robust principal component analysis
Abstract Audio–Visual Segmentation (AVS) aims to extract the sounding objects from a
video. The current learning-based AVS methods are often supervised, which rely on specific …
video. The current learning-based AVS methods are often supervised, which rely on specific …
Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey
A Shahabaz, S Sarkar - IEEE Access, 2024 - ieeexplore.ieee.org
The joint analysis of audio and video is a powerful tool that can be applied to various
contexts, including action, speech, and sound recognition, audio-visual video parsing …
contexts, including action, speech, and sound recognition, audio-visual video parsing …
Audio-visual spatial integration and recursive attention for robust sound source localization
The objective of the sound source localization task is to enable machines to detect the
location of sound-making objects within a visual scene. While the audio modality provides …
location of sound-making objects within a visual scene. While the audio modality provides …
Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
The goal of the multi-sound source localization task is to localize sound sources from the
mixture individually. While recent multi-sound source localization methods have shown …
mixture individually. While recent multi-sound source localization methods have shown …
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
Recent studies on learning-based sound source localization have mainly focused on the
localization performance perspective. However, prior work and existing benchmarks …
localization performance perspective. However, prior work and existing benchmarks …
Robust Audio-Visual Contrastive Learning for Proposal-based Self-supervised Sound Source Localization in Videos
By observing a scene and listening to corresponding audio cues, humans can easily
recognize where the sound is. To achieve such cross-modal perception on machines …
recognize where the sound is. To achieve such cross-modal perception on machines …