A closer look at weakly-supervised audio-visual source localization

S Mo, P Morgado - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Audio-visual source localization is a challenging task that aims to predict the location of
visual sound sources in a video. Since collecting ground-truth annotations of sounding …

Annotation-free audio-visual segmentation

J Liu, Y Wang, C Ju, C Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract The objective of Audio-Visual Segmentation (AVS) is to localise the sounding
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …

iquery: Instruments as queries for audio-visual sound separation

J Chen, R Zhang, D Lian, J Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Current audio-visual separation methods share a standard architecture design where an
audio encoder-decoder network is fused with visual encoding features at the encoder …

Dual mean-teacher: An unbiased semi-supervised framework for audio-visual source localization

Y Guo, S Ma, H Su, Z Wang, Y Zhao… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Audio-Visual Source Localization (AVSL) aims to locate sounding objects within
video frames given the paired audio clips. Existing methods predominantly rely on self …

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

UNSSOR: unsupervised neural speech separation by leveraging over-determined training mixtures

ZQ Wang, S Watanabe - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reverberant conditions with multiple concurrent speakers, each microphone acquires a
mixture signal of multiple speakers at a different location. In over-determined conditions …

Sound localization from motion: Jointly learning sound direction and camera rotation

Z Chen, S Qian, A Owens - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The images and sounds that we perceive undergo subtle but geometrically consistent
changes as we rotate our heads. In this paper, we use these cues to solve a problem we call …

Lavss: Location-guided audio-visual spatial audio separation

Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …

Multimodal imbalance-aware gradient modulation for weakly-supervised audio-visual video parsing

J Fu, J Gao, BK Bao, C Xu - … on Circuits and Systems for Video …, 2023 - ieeexplore.ieee.org
Weakly-supervised audio-visual video parsing (WS-AVVP) aims to localize the temporal
extents of audio, visual and audio-visual event instances as well as identify the …

CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training

Y Guo, S Sun, S Ma, K Zheng, X Bao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Learning joint and coordinated features across modalities is essential for many audio-visual
tasks. Existing pre-training methods primarily focus on global information neglecting fine …