Visual speech enhancement without a real visual stream

M Rehaan, N Kaur, S Kingra - Smart Science, 2024 - Taylor & Francis

With the progression of deep-learning techniques, digital media recording and synthesis
media generation have become exceptionally easy. Due to open access of user-friendly …

被引用次数：8 相关文章

[PDF] thecvf.com

Adverb: Visually guided audio dereverberation

S Chowdhury, S Ghosh, S Dasgupta… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues
in addition to the reverberant sound to estimate clean audio. Although audio-only …

被引用次数：8 相关文章所有 6 个版本

[PDF] apsipa.org

Lip sync matters: A novel multimodal forgery detector

SA Shahzad, A Hashmi, S Khan… - 2022 Asia-Pacific …, 2022 - ieeexplore.ieee.org

Deepfake technology has advanced a lot, but it is a double-sided sword for the community.
One can use it for beneficial purposes, such as restoring vintage content in old movies, or for …

被引用次数：30 相关文章所有 6 个版本

[PDF] arxiv.org

Visual context-driven audio feature enhancement for robust end-to-end audio-visual speech recognition

J Hong, M Kim, D Yoo, YM Ro - arXiv preprint arXiv:2207.06020, 2022 - arxiv.org

This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech
Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature …

被引用次数：20 相关文章所有 9 个版本

Sensing to hear through memory: Ultrasound speech enhancement without real ultrasound signals

Q Zhang, K Liu, D Wang - Proceedings of the ACM on Interactive, Mobile …, 2024 - dl.acm.org

Speech enhancement on mobile devices is a very challenging task due to the complex
environmental noises. Recent works using lip-induced ultrasound signals for speech …

被引用次数：1 相关文章

[PDF] aaai.org

Visual Hallucination Elevates Speech Recognition

F Zhang, Y Zhu, X Wang, H Chen, X Sun… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Due to the detrimental impact of noise on the conventional audio speech recognition (ASR)
task, audio-visual speech recognition~(AVSR) has been proposed by incorporating both …

被引用次数：2 相关文章

[PDF] arxiv.org

Incorporating ultrasound tongue images for audio-visual speech enhancement through knowledge distillation

RC Zheng, Y Ai, ZH Ling - arXiv preprint arXiv:2305.14933, 2023 - arxiv.org

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with
extra visual information such as lip videos, and has been shown to be more effective than …

被引用次数：5 相关文章所有 4 个版本

[PDF] frontiersin.org

Speech-driven facial animations improve speech-in-noise comprehension of humans

E Varano, K Vougioukas, P Ma, S Petridis… - Frontiers in …, 2022 - frontiersin.org

Understanding speech becomes a demanding task when the environment is noisy.
Comprehension of speech in noise can be substantially improved by looking at the …

被引用次数：6 相关文章所有 9 个版本

Vision-guided music source separation via a fine-grained cycle-separation network

M Shuo, Y Ji, X Xu, X Zhu - Proceedings of the 29th ACM International …, 2021 - dl.acm.org

Music source separation from a sound mixture remains a big challenge because there often
exist heavy overlaps and interactions among similar music signals. In order to correctly …

被引用次数：6 相关文章

[PDF] openreview.net

Filter-Recovery Network for Multi-Speaker Audio-Visual Speech Separation

H Cheng, Z Liu, W Wu, L Wang - The Eleventh International …, 2023 - openreview.net

In this paper, we systematically study the audio-visual speech separation task in a multi-
speaker scenario. Given the facial information of each speaker, the goal of this task is to …

被引用次数：4 相关文章