Speech reconstruction with reminiscent sound via visual voice memory

Watch or listen: Robust audio-visual speech recognition with visual corruption modeling and reliability scoring

J Hong, M Kim, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper deals with Audio-Visual Speech Recognition (AVSR) under multimodal input
corruption situation where audio inputs and visual inputs are both corrupted, which is not …

被引用次数：25 相关文章所有 7 个版本

[PDF] aaai.org

Distinguishing homophenes using multi-head visual-audio memory for lip reading

M Kim, JH Yeo, YM Ro - Proceedings of the AAAI conference on …, 2022 - ojs.aaai.org

Recognizing speech from silent lip movement, which is called lip reading, is a challenging
task due to 1) the inherent information insufficiency of lip movement to fully represent the …

被引用次数：54 相关文章所有 6 个版本

[PDF] thecvf.com

Lip reading for low-resource languages by learning and combining general speech knowledge and language-specific knowledge

M Kim, JH Yeo, J Choi, YM Ro - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper proposes a novel lip reading framework, especially for low-resource languages,
which has not been well addressed in the previous literature. Since low-resource languages …

被引用次数：11 相关文章所有 6 个版本

[PDF] arxiv.org

Speaker-adaptive lip reading with user-dependent padding

M Kim, H Kim, YM Ro - European Conference on Computer Vision, 2022 - Springer

Lip reading aims to predict speech based on lip movements alone. As it focuses on visual
information to model the speech, its performance is inherently sensitive to personal lip …

被引用次数：21 相关文章所有 7 个版本

Analyzing lower half facial gestures for lip reading applications: Survey on vision techniques

SJ Preethi - Computer Vision and Image Understanding, 2023 - Elsevier

Lip reading has gained popularity due to the proliferation of emerging real-world
applications. This article provides a comprehensive review of benchmark datasets available …

被引用次数：5 相关文章所有 2 个版本

[PDF] thecvf.com

DiffV2S: Diffusion-based video-to-speech synthesis with vision-guided speaker embedding

J Choi, J Hong, YM Ro - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Recent research has demonstrated impressive results in video-to-speech synthesis which
involves reconstructing speech solely from visual input. However, previous works have …

被引用次数：6 相关文章所有 6 个版本

[PDF] uni-augsburg.de

[PDF][PDF] SVTS: scalable video-to-speech synthesis

R Mira, A Haliassos, S Petridis… - arXiv preprint …, 2022 - opus.bibliothek.uni-augsburg.de

Video-to-speech synthesis (also known as lip-to-speech) refers to the translation of silent lip
movements into the corresponding audio. This task has received an increasing amount of …

被引用次数：27 相关文章所有 9 个版本

[PDF] arxiv.org

Intelligible lip-to-speech synthesis with speech units

J Choi, M Kim, YM Ro - arXiv preprint arXiv:2305.19603, 2023 - arxiv.org

In this paper, we propose a novel Lip-to-Speech synthesis (L2S) framework, for synthesizing
intelligible speech from a silent lip movement video. Specifically, to complement the …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Lip-to-speech synthesis in the wild with multi-task learning

M Kim, J Hong, YM Ro - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org

Recent studies have shown impressive performance in Lip-to-speech synthesis that aims to
reconstruct speech from visual information alone. However, they have been suffering from …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Visual context-driven audio feature enhancement for robust end-to-end audio-visual speech recognition

J Hong, M Kim, D Yoo, YM Ro - arXiv preprint arXiv:2207.06020, 2022 - arxiv.org

This paper focuses on designing a noise-robust end-to-end Audio-Visual Speech
Recognition (AVSR) system. To this end, we propose Visual Context-driven Audio Feature …

被引用次数：18 相关文章所有 9 个版本