Multimodal analysis for identification and segmentation of moving-sounding objects

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：189 相关文章所有 12 个版本

[PDF] arxiv.org

Audio surveillance: A systematic review

M Crocco, M Cristani, A Trucco, V Murino - ACM Computing Surveys …, 2016 - dl.acm.org

Despite surveillance systems becoming increasingly ubiquitous in our living environment,
automated surveillance, currently based on video sensory modality and machine …

被引用次数：328 相关文章所有 6 个版本

[PDF] arxiv.org

Self-supervised learning of audio-visual objects from video

T Afouras, A Owens, JS Chung, A Zisserman - Computer Vision–ECCV …, 2020 - Springer

Our objective is to transform a video into a set of discrete audio-visual objects using self-
supervised learning. To this end, we introduce a model that uses attention to localize and …

被引用次数：286 相关文章所有 8 个版本

[PDF] thecvf.com

Localizing visual sounds the hard way

H Chen, W Xie, T Afouras, A Nagrani… - Proceedings of the …, 2021 - openaccess.thecvf.com

The objective of this work is to localize sound sources that are visible in a video without
using manual annotations. Our key technical contribution is to show that, by training the …

被引用次数：209 相关文章所有 7 个版本

[PDF] neurips.cc

Cooperative learning of audio and video models from self-supervised synchronization

B Korbar, D Tran, L Torresani - Advances in Neural …, 2018 - proceedings.neurips.cc

There is a natural correlation between the visual and auditive elements of a video. In this
work we leverage this connection to learn general and effective models for both audio and …

被引用次数：552 相关文章所有 9 个版本

[PDF] thecvf.com

The sound of pixels

H Zhao, C Gan, A Rouditchenko… - Proceedings of the …, 2018 - openaccess.thecvf.com

We introduce PixelPlayer, a system that, by leveraging large amounts of unlabeled videos,
learns to locate image regions which produce sounds and separate the input sounds into a …

被引用次数：622 相关文章所有 10 个版本

[PDF] thecvf.com

Music gesture for visual sound separation

C Gan, D Huang, H Zhao… - Proceedings of the …, 2020 - openaccess.thecvf.com

Recent deep learning approaches have achieved impressive performance on visual sound
separation tasks. However, these approaches are mostly built on appearance and optical …

被引用次数：226 相关文章所有 9 个版本

[PDF] thecvf.com

Learning to localize sound source in visual scenes

A Senocak, TH Oh, J Kim, MH Yang… - Proceedings of the …, 2018 - openaccess.thecvf.com

Visual events are usually accompanied by sounds in our daily lives. We pose the question:
Can the machine learn the correspondence between visual scene and the sound, and …

被引用次数：384 相关文章所有 9 个版本

[PDF] thecvf.com

The sound of motions

H Zhao, C Gan, WC Ma… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Sounds originate from object motions and vibrations of surrounding air. Inspired by the fact
that humans is capable of interpreting sound sources from how objects move visually, we …

被引用次数：291 相关文章所有 8 个版本

[PDF] neurips.cc

Discriminative sounding objects localization via self-supervised audiovisual matching

D Hu, R Qian, M Jiang, X Tan, S Wen… - Advances in …, 2020 - proceedings.neurips.cc

Discriminatively localizing sounding objects in cocktail-party, ie, mixed sound scenes, is
commonplace for humans, but still challenging for machines. In this paper, we propose a two …

被引用次数：149 相关文章所有 7 个版本