Deep audio-visual learning: A survey
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …
modalities, has drawn considerable attention since deep learning started to be used …
Audio surveillance: A systematic review
Despite surveillance systems becoming increasingly ubiquitous in our living environment,
automated surveillance, currently based on video sensory modality and machine …
automated surveillance, currently based on video sensory modality and machine …
Self-supervised learning of audio-visual objects from video
Our objective is to transform a video into a set of discrete audio-visual objects using self-
supervised learning. To this end, we introduce a model that uses attention to localize and …
supervised learning. To this end, we introduce a model that uses attention to localize and …
Localizing visual sounds the hard way
The objective of this work is to localize sound sources that are visible in a video without
using manual annotations. Our key technical contribution is to show that, by training the …
using manual annotations. Our key technical contribution is to show that, by training the …
Cooperative learning of audio and video models from self-supervised synchronization
There is a natural correlation between the visual and auditive elements of a video. In this
work we leverage this connection to learn general and effective models for both audio and …
work we leverage this connection to learn general and effective models for both audio and …
The sound of pixels
We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos,
learns to locate image regions which produce sounds and separate the input sounds into a …
learns to locate image regions which produce sounds and separate the input sounds into a …
Music gesture for visual sound separation
Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …
separation tasks. However, these approaches are mostly built on appearance and optical …
Learning to localize sound source in visual scenes
Visual events are usually accompanied by sounds in our daily lives. We pose the question:
Can the machine learn the correspondence between visual scene and the sound, and …
Can the machine learn the correspondence between visual scene and the sound, and …
The sound of motions
Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact
that humans is capable of interpreting sound sources from how objects move visually, we …
that humans is capable of interpreting sound sources from how objects move visually, we …
Discriminative sounding objects localization via self-supervised audiovisual matching
Discriminatively localizing sounding objects in cocktail-party, ie, mixed sound scenes, is
commonplace for humans, but still challenging for machines. In this paper, we propose a two …
commonplace for humans, but still challenging for machines. In this paper, we propose a two …