A closer look at weakly-supervised audio-visual source localization
S Mo, P Morgado - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Audio-visual source localization is a challenging task that aims to predict the location of
visual sound sources in a video. Since collecting ground-truth annotations of sounding …
visual sound sources in a video. Since collecting ground-truth annotations of sounding …
Annotation-free audio-visual segmentation
J Liu, Y Wang, C Ju, C Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract The objective of Audio-Visual Segmentation (AVS) is to localise the sounding
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …
objects within visual scenes by accurately predicting pixel-wise segmentation masks. To …
iquery: Instruments as queries for audio-visual sound separation
J Chen, R Zhang, D Lian, J Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Current audio-visual separation methods share a standard architecture design where an
audio encoder-decoder network is fused with visual encoding features at the encoder …
audio encoder-decoder network is fused with visual encoding features at the encoder …
Dual mean-teacher: An unbiased semi-supervised framework for audio-visual source localization
Y Guo, S Ma, H Su, Z Wang, Y Zhao… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Audio-Visual Source Localization (AVSL) aims to locate sounding objects within
video frames given the paired audio clips. Existing methods predominantly rely on self …
video frames given the paired audio clips. Existing methods predominantly rely on self …
Audio-Visual Segmentation via Unlabeled Frame Exploitation
J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …
Although great progress has been witnessed we experimentally reveal that current methods …
UNSSOR: unsupervised neural speech separation by leveraging over-determined training mixtures
ZQ Wang, S Watanabe - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In reverberant conditions with multiple concurrent speakers, each microphone acquires a
mixture signal of multiple speakers at a different location. In over-determined conditions …
mixture signal of multiple speakers at a different location. In over-determined conditions …
Sound localization from motion: Jointly learning sound direction and camera rotation
Z Chen, S Qian, A Owens - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The images and sounds that we perceive undergo subtle but geometrically consistent
changes as we rotate our heads. In this paper, we use these cues to solve a problem we call …
changes as we rotate our heads. In this paper, we use these cues to solve a problem we call …
Lavss: Location-guided audio-visual spatial audio separation
Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …
visual separation (MAVS). However, most MAVS methods purely consider what the sound …
Multimodal imbalance-aware gradient modulation for weakly-supervised audio-visual video parsing
J Fu, J Gao, BK Bao, C Xu - … on Circuits and Systems for Video …, 2023 - ieeexplore.ieee.org
Weakly-supervised audio-visual video parsing (WS-AVVP) aims to localize the temporal
extents of audio, visual and audio-visual event instances as well as identify the …
extents of audio, visual and audio-visual event instances as well as identify the …
CrossMAE: Cross-Modality Masked Autoencoders for Region-Aware Audio-Visual Pre-Training
Y Guo, S Sun, S Ma, K Zheng, X Bao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Learning joint and coordinated features across modalities is essential for many audio-visual
tasks. Existing pre-training methods primarily focus on global information neglecting fine …
tasks. Existing pre-training methods primarily focus on global information neglecting fine …