Robust Audio-Visual Contrastive Learning for Proposal-based Self-supervised Sound Source Localization in Videos

H Xuan, Z Wu, J Yang, B Jiang, L Luo… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
By observing a scene and listening to corresponding audio cues, humans can easily
recognize where the sound is. To achieve such cross-modal perception on machines …