Face manipulated deepfake generation and recognition approaches: A survey
M Rehaan, N Kaur, S Kingra - Smart Science, 2024 - Taylor & Francis
With the progression of deep-learning techniques, digital media recording and synthesis
media generation have become exceptionally easy. Due to open access of user-friendly …
media generation have become exceptionally easy. Due to open access of user-friendly …
Adverb: Visually guided audio dereverberation
We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues
in addition to the reverberant sound to estimate clean audio. Although audio-only …
in addition to the reverberant sound to estimate clean audio. Although audio-only …
Lip sync matters: A novel multimodal forgery detector
Deepfake technology has advanced a lot, but it is a double-sided sword for the community.
One can use it for beneficial purposes, such as restoring vintage content in old movies, or for …
One can use it for beneficial purposes, such as restoring vintage content in old movies, or for …
Visual context-driven audio feature enhancement for robust end-to-end audio-visual speech recognition
This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech
Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature …
Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature …
Sensing to hear through memory: Ultrasound speech enhancement without real ultrasound signals
Speech enhancement on mobile devices is a very challenging task due to the complex
environmental noises. Recent works using lip-induced ultrasound signals for speech …
environmental noises. Recent works using lip-induced ultrasound signals for speech …
Visual Hallucination Elevates Speech Recognition
Due to the detrimental impact of noise on the conventional audio speech recognition (ASR)
task, audio-visual speech recognition~(AVSR) has been proposed by incorporating both …
task, audio-visual speech recognition~(AVSR) has been proposed by incorporating both …
Incorporating ultrasound tongue images for audio-visual speech enhancement through knowledge distillation
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with
extra visual information such as lip videos, and has been shown to be more effective than …
extra visual information such as lip videos, and has been shown to be more effective than …
Speech-driven facial animations improve speech-in-noise comprehension of humans
Understanding speech becomes a demanding task when the environment is noisy.
Comprehension of speech in noise can be substantially improved by looking at the …
Comprehension of speech in noise can be substantially improved by looking at the …
Vision-guided music source separation via a fine-grained cycle-separation network
Music source separation from a sound mixture remains a big challenge because there often
exist heavy overlaps and interactions among similar music signals. In order to correctly …
exist heavy overlaps and interactions among similar music signals. In order to correctly …
Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation
In this paper, we systematically study the audio-visual speech separation task in a multi-
speaker scenario. Given the facial information of each speaker, the goal of this task is to …
speaker scenario. Given the facial information of each speaker, the goal of this task is to …