Robust Audio-Visual Contrastive Learning for Proposal-based Self-supervised Sound Source Localization in Videos
By observing a scene and listening to corresponding audio cues, humans can easily
recognize where the sound is. To achieve such cross-modal perception on machines …
recognize where the sound is. To achieve such cross-modal perception on machines …