Video-guided sound source separation

J Zhou, F Wang, D Guo, H Liu, F Sun - … 8–11, 2019, Proceedings, Part I 12, 2019 - Springer
… The visual and audio information usually jointly help human’s recognition. Motivated by the
… information separating sound sources better, we intend to guide sound source separation

Learning audio-visual dynamics using scene graphs for audio source separation

M Chatterjee, N Ahuja… - Advances in Neural …, 2022 - proceedings.neurips.cc
… for video-guided audio source separation from an acoustic mixture that can also predict the
direction of motion of the sound source. … To achieve audio source separation, we propose a …

Improving on-screen sound separation for open-domain videos with audio-visual self-attention

E Tzinis, S Wisdom, T Remez, JR Hershey - arXiv preprint arXiv …, 2021 - arxiv.org
audio-visual classifier per source by utilizing the powerful representations obtained from
unsupervised pre-training of the audio source separation … describes video-guided attention …

As We Speak: Real-Time Visually Guided Speaker Separation and Localization

P Czarnecki, J Tkaczuk - … Workshop on Multimedia Signal …, 2022 - ieeexplore.ieee.org
… Our model performs a monaural (single-microphone) video guided speaker separation. “Fig…
Rahtu, “Visually guided sound source separation and localization using self-supervised …

RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

T Pan, J Liu, B Wang, J Tang, G Wu - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
separated features and the order of clean audio labels, we employ distinct training strategies
for the video-guided … [18, 48] combined with the scale-invariant signal-to-distortion ratio (SI-…

Deep video inpainting guided by audio-visual self-supervision

K Kim, J Jung, WJ Kim, SE Yoon - … , Speech and Signal …, 2022 - ieeexplore.ieee.org
audio signal as an important cue for restoring the corrupted frame. Given the prior information
of audio-visual correlation that AV-Net provides, we propose two novel audio-visual losses …

Joint learning of audio–visual saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - International Journal of …, 2024 - Springer
… (1) We supplement a profound analysis on the factors that influence sound source
localization, motivating us to embed sound source localization as an auxiliary task for saliency …

Self-supervised learning for alignment of objects and sound

X Liu, X Liu, D Guo, H Liu, F Sun… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
… several audio-only sound source separation baselines including RPCA, HPSS, and NMF
methods. At the same time, we also compare our method with the sound source separation

[PDF][PDF] Video-guided speech inpainting transformer

… Specifically, this paper focuses on the problem of audio-visual speech … audio signal is known
as audio inpainting [1]. Carrying out such a restoration for long segments of corrupted audio

Joint learning of visual-audio saliency prediction and sound source localization on multi-face videos

M Qiao, Y Liu, M Xu, X Deng, B Li, W Hu… - arXiv preprint arXiv …, 2021 - arxiv.org
… 3) We conduct additional experiments on both sound source localization and saliency
prediction, eg, comparing with more methods, and evaluating on more databases, as well as …