Self-supervised video forensics by audio-visual anomaly detection
Manipulated videos often contain subtle inconsistencies between their visual and audio
signals. We propose a video forensics method, based on anomaly detection, that can …
signals. We propose a video forensics method, based on anomaly detection, that can …
Audio-visual generalised zero-shot learning with cross-modal attention and language
Learning to classify video data from classes not included in the training data, ie video-based
zero-shot learning, is challenging. We conjecture that the natural alignment between the …
zero-shot learning, is challenging. We conjecture that the natural alignment between the …
Audio-synchronized visual animation
Current visual generation methods can produce high-quality videos guided by text prompts.
However, effectively controlling object dynamics remains a challenge. This work explores …
However, effectively controlling object dynamics remains a challenge. This work explores …
Audio-visual segmentation with semantics
We propose a new problem called audio-visual segmentation (AVS), in which the goal is to
output a pixel-level map of the object (s) that produce sound at the time of the image frame …
output a pixel-level map of the object (s) that produce sound at the time of the image frame …
Masked generative video-to-audio transformers with enhanced synchronicity
Abstract Video-to-audio (V2A) generation leverages visual-only video features to render
plausible sounds that match the scene. Importantly, the generated sound onsets should …
plausible sounds that match the scene. Importantly, the generated sound onsets should …
Foleycrafter: Bring silent videos to life with lifelike and synchronized sounds
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing
with videos, enabling an immersive audio-visual experience. Despite its wide range of …
with videos, enabling an immersive audio-visual experience. Despite its wide range of …
Reading to listen at the cocktail party: Multi-modal speech separation
The goal of this paper is speech separation and enhancement in multi-speaker and noisy
environments using a combination of different modalities. Previous works have shown good …
environments using a combination of different modalities. Previous works have shown good …
Self-supervised audio-visual soundscape stylization
Speech sounds convey a great deal of information about the scenes, resulting in a variety of
effects ranging from reverberation to additional ambient sounds. In this paper, we …
effects ranging from reverberation to additional ambient sounds. In this paper, we …
Vocalist: An audio-visual synchronisation model for lips and voices
VS Kadandale, JF Montesinos, G Haro - arXiv preprint arXiv:2204.02090, 2022 - arxiv.org
In this paper, we address the problem of lip-voice synchronisation in videos containing
human face and voice. Our approach is based on determining if the lips motion and the …
human face and voice. Our approach is based on determining if the lips motion and the …
Sparse in space and time: Audio-visual synchronisation with trainable selectors
The objective of this paper is audio-visual synchronisation of general videos' in the wild'. For
such videos, the events that may be harnessed for synchronisation cues may be spatially …
such videos, the events that may be harnessed for synchronisation cues may be spatially …