Visually guided sound source separation using cascaded opponent filter network

D Michelsanti, ZH Tan, SX Zhang, Y Xu… - … on Audio, Speech …, 2021 - ieeexplore.ieee.org

Speech enhancement and speech separation are two related tasks, whose purpose is to
extract either one or more target speech signals, respectively, from a mixture of sounds …

被引用次数：275 相关文章所有 6 个版本

[PDF] arxiv.org

A better use of audio-visual cues: Dense video captioning with bi-modal transformer

V Iashin, E Rahtu - arXiv preprint arXiv:2005.08271, 2020 - arxiv.org

Dense video captioning aims to localize and describe important events in untrimmed videos.
Existing methods mainly tackle this task by exploiting only visual features, while completely …

被引用次数：158 相关文章所有 5 个版本

[PDF] arxiv.org

Taming visually guided sound generation

V Iashin, E Rahtu - arXiv preprint arXiv:2110.08791, 2021 - arxiv.org

Recent advances in visually-induced audio generation are based on sampling short, low-
fidelity, and one-class sounds. Moreover, sampling 1 second of audio from the state-of-the …

被引用次数：86 相关文章所有 6 个版本

[PDF] arxiv.org

Into the wild with audioscope: Unsupervised audio-visual separation of on-screen sounds

E Tzinis, S Wisdom, A Jansen, S Hershey… - arXiv preprint arXiv …, 2020 - arxiv.org

Recent progress in deep learning has enabled many advances in sound separation and
visual scene understanding. However, extracting sound sources which are apparent in …

被引用次数：73 相关文章所有 9 个版本

[PDF] thecvf.com

iquery: Instruments as queries for audio-visual sound separation

J Chen, R Zhang, D Lian, J Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Current audio-visual separation methods share a standard architecture design where an
audio encoder-decoder network is fused with visual encoding features at the encoder …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Recent advances and challenges in deep audio-visual correlation learning

L Vilaça, Y Yu, P Viana - arXiv preprint arXiv:2202.13673, 2022 - arxiv.org

Audio-visual correlation learning aims to capture essential correspondences and
understand natural phenomena between audio and video. With the rapid growth of deep …

被引用次数：8 相关文章所有 3 个版本

[PDF] thecvf.com

Lavss: Location-guided audio-visual spatial audio separation

Y Ye, W Yang, Y Tian - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com

Existing machine learning research has achieved promising results in monaural audio-
visual separation (MAVS). However, most MAVS methods purely consider what the sound …

被引用次数：6 相关文章所有 5 个版本

[PDF] thecvf.com

Visually guided sound source separation and localization using self-supervised motion representations

L Zhu, E Rahtu - Proceedings of the IEEE/CVF Winter …, 2022 - openaccess.thecvf.com

In this paper, we perform audio-visual sound source separation, ie to separate component
audios from a mixture based on the videos of sound sources. Moreover, we aim to pinpoint …

被引用次数：24 相关文章所有 8 个版本

[PDF] thecvf.com

V-slowfast network for efficient visual sound separation

L Zhu, E Rahtu - Proceedings of the IEEE/CVF Winter …, 2022 - openaccess.thecvf.com

The objective of this paper is to perform visual sound separation: i) we study visual sound
separation on spectrograms of different temporal resolutions; ii) we propose a new light yet …

被引用次数：10 相关文章所有 8 个版本

[PDF] arxiv.org

A cappella: Audio-visual singing voice separation

JF Montesinos, VS Kadandale, G Haro - arXiv preprint arXiv:2104.09946, 2021 - arxiv.org

The task of isolating a target singing voice in music videos has useful applications. In this
work, we explore the single-channel singing voice separation problem from a multimodal …

被引用次数：21 相关文章所有 6 个版本