Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Audio surveillance: A systematic review

M Crocco, M Cristani, A Trucco, V Murino - ACM Computing Surveys …, 2016 - dl.acm.org
Despite surveillance systems becoming increasingly ubiquitous in our living environment,
automated surveillance, currently based on video sensory modality and machine …

Self-supervised learning of audio-visual objects from video

T Afouras, A Owens, JS Chung, A Zisserman - Computer Vision–ECCV …, 2020 - Springer
Our objective is to transform a video into a set of discrete audio-visual objects using self-
supervised learning. To this end, we introduce a model that uses attention to localize and …

Localizing visual sounds the hard way

H Chen, W Xie, T Afouras, A Nagrani… - Proceedings of the …, 2021 - openaccess.thecvf.com
The objective of this work is to localize sound sources that are visible in a video without
using manual annotations. Our key technical contribution is to show that, by training the …

Cooperative learning of audio and video models from self-supervised synchronization

B Korbar, D Tran, L Torresani - Advances in Neural …, 2018 - proceedings.neurips.cc
There is a natural correlation between the visual and auditive elements of a video. In this
work we leverage this connection to learn general and effective models for both audio and …

The sound of pixels

H Zhao, C Gan, A Rouditchenko… - Proceedings of the …, 2018 - openaccess.thecvf.com
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos,
learns to locate image regions which produce sounds and separate the input sounds into a …

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com
Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

Learning to localize sound source in visual scenes

A Senocak, TH Oh, J Kim, MH Yang… - Proceedings of the …, 2018 - openaccess.thecvf.com
Visual events are usually accompanied by sounds in our daily lives. We pose the question:
Can the machine learn the correspondence between visual scene and the sound, and …

The sound of motions

H Zhao, C Gan, WC Ma… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact
that humans is capable of interpreting sound sources from how objects move visually, we …

Discriminative sounding objects localization via self-supervised audiovisual matching

D Hu, R Qian, M Jiang, X Tan, S Wen… - Advances in …, 2020 - proceedings.neurips.cc
Discriminatively localizing sounding objects in cocktail-party, ie, mixed sound scenes, is
commonplace for humans, but still challenging for machines. In this paper, we propose a two …