Video-guided sound source separation- 学术资源搜索

Video-guided sound source separation

J Zhou, F Wang, D Guo, H Liu, F Sun - … 8–11, 2019, Proceedings, Part I 12, 2019 - Springer

… The visual and audio information usually jointly help human’s recognition. Motivated by the
… information separating sound sources better, we intend to guide sound source separation …

被引用次数：1 相关文章

[PDF] neurips.cc

Learning audio-visual dynamics using scene graphs for audio source separation

M Chatterjee, N Ahuja… - Advances in Neural …, 2022 - proceedings.neurips.cc

… for video-guided audio source separation from an acoustic mixture that can also predict the
direction of motion of the sound source. … To achieve audio source separation, we propose a …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

Improving on-screen sound separation for open-domain videos with audio-visual self-attention

E Tzinis, S Wisdom, T Remez, JR Hershey - arXiv preprint arXiv …, 2021 - arxiv.org

… audio-visual classifier per source by utilizing the powerful representations obtained from
unsupervised pre-training of the audio source separation … describes video-guided attention …

被引用次数：9 相关文章所有 2 个版本

As We Speak: Real-Time Visually Guided Speaker Separation and Localization

P Czarnecki, J Tkaczuk - … Workshop on Multimedia Signal …, 2022 - ieeexplore.ieee.org

… Our model performs a monaural (single-microphone) video guided speaker separation. “Fig…
Rahtu, “Visually guided sound source separation and localization using self-supervised …

被引用次数：1 相关文章

[PDF] arxiv.org

RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

T Pan, J Liu, B Wang, J Tang, G Wu - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

… separated features and the order of clean audio labels, we employ distinct training strategies
for the video-guided … [18, 48] combined with the scale-invariant signal-to-distortion ratio (SI-…

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Deep video inpainting guided by audio-visual self-supervision

K Kim, J Jung, WJ Kim, SE Yoon - … , Speech and Signal …, 2022 - ieeexplore.ieee.org

… audio signal as an important cue for restoring the corrupted frame. Given the prior information
of audio-visual correlation that AV-Net provides, we propose two novel audio-visual losses …

被引用次数：3 相关文章所有 5 个版本

Joint learning of audio–visual saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - International Journal of …, 2024 - Springer

… (1) We supplement a profound analysis on the factors that influence sound source
localization, motivating us to embed sound source localization as an auxiliary task for saliency …

被引用次数：8 相关文章所有 2 个版本

Self-supervised learning for alignment of objects and sound

X Liu, X Liu, D Guo, H Liu, F Sun… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org

… several audio-only sound source separation baselines including RPCA, HPSS, and NMF
methods. At the same time, we also compare our method with the sound source separation …

被引用次数：4 相关文章

[PDF] av4d.org

[PDF][PDF] Video-guided speech inpainting transformer

JF Montesinos, D Michelsanti, G Haro, ZH Tan… - av4d.org

… Specifically, this paper focuses on the problem of audio-visual speech … audio signal is known
as audio inpainting [1]. Carrying out such a restoration for long segments of corrupted audio …

[PDF] arxiv.org

Joint learning of visual-audio saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - arXiv preprint arXiv …, 2021 - arxiv.org

… 3) We conduct additional experiments on both sound source localization and saliency
prediction, eg, comparing with more methods, and evaluating on more databases, as well as …

被引用次数：4 相关文章所有 3 个版本