Face manipulated deepfake generation and recognition approaches: A survey

M Rehaan, N Kaur, S Kingra - Smart Science, 2024 - Taylor & Francis
With the progression of deep-learning techniques, digital media recording and synthesis
media generation have become exceptionally easy. Due to open access of user-friendly …

Adverb: Visually guided audio dereverberation

S Chowdhury, S Ghosh, S Dasgupta… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues
in addition to the reverberant sound to estimate clean audio. Although audio-only …

Lip sync matters: A novel multimodal forgery detector

SA Shahzad, A Hashmi, S Khan… - 2022 Asia-Pacific …, 2022 - ieeexplore.ieee.org
Deepfake technology has advanced a lot, but it is a double-sided sword for the community.
One can use it for beneficial purposes, such as restoring vintage content in old movies, or for …

Visual context-driven audio feature enhancement for robust end-to-end audio-visual speech recognition

J Hong, M Kim, D Yoo, YM Ro - arXiv preprint arXiv:2207.06020, 2022 - arxiv.org
This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech
Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature …

Sensing to hear through memory: Ultrasound speech enhancement without real ultrasound signals

Q Zhang, K Liu, D Wang - Proceedings of the ACM on Interactive, Mobile …, 2024 - dl.acm.org
Speech enhancement on mobile devices is a very challenging task due to the complex
environmental noises. Recent works using lip-induced ultrasound signals for speech …

Visual Hallucination Elevates Speech Recognition

F Zhang, Y Zhu, X Wang, H Chen, X Sun… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Due to the detrimental impact of noise on the conventional audio speech recognition (ASR)
task, audio-visual speech recognition~(AVSR) has been proposed by incorporating both …

Incorporating ultrasound tongue images for audio-visual speech enhancement through knowledge distillation

RC Zheng, Y Ai, ZH Ling - arXiv preprint arXiv:2305.14933, 2023 - arxiv.org
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with
extra visual information such as lip videos, and has been shown to be more effective than …

Speech-driven facial animations improve speech-in-noise comprehension of humans

E Varano, K Vougioukas, P Ma, S Petridis… - Frontiers in …, 2022 - frontiersin.org
Understanding speech becomes a demanding task when the environment is noisy.
Comprehension of speech in noise can be substantially improved by looking at the …

Vision-guided music source separation via a fine-grained cycle-separation network

M Shuo, Y Ji, X Xu, X Zhu - Proceedings of the 29th ACM International …, 2021 - dl.acm.org
Music source separation from a sound mixture remains a big challenge because there often
exist heavy overlaps and interactions among similar music signals. In order to correctly …

Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation

H Cheng, Z Liu, W Wu, L Wang - The Eleventh International …, 2023 - openreview.net
In this paper, we systematically study the audio-visual speech separation task in a multi-
speaker scenario. Given the facial information of each speaker, the goal of this task is to …