Investigations on end-to-end audiovisual fusion

A Fernandez-Lopez, FM Sukno - Image and Vision Computing, 2018 - Elsevier

In the last few years, there has been an increasing interest in developing systems for
Automatic Lip-Reading (ALR). Similarly to other computer vision applications, methods …

被引用次数：133 相关文章所有 3 个版本

[PDF] ieee.org

Deep learning-based automated lip-reading: A survey

S Fenghour, D Chen, K Guo, B Li, P Xiao - IEEE Access, 2021 - ieeexplore.ieee.org

A survey on automated lip-reading approaches is presented in this paper with the main
focus being on deep learning related methodologies which have proven to be more fruitful …

被引用次数：39 相关文章所有 4 个版本

[PDF] ieee.org

Lip reading sentences using deep learning with only visual cues

S Fenghour, D Chen, K Guo, P Xiao - IEEE Access, 2020 - ieeexplore.ieee.org

In this paper, a neural network-based lip reading system is proposed. The system is lexicon-
free and uses purely visual cues. With only a limited number of visemes as classes to …

被引用次数：51 相关文章所有 3 个版本

[PDF] arxiv.org

Lipformer: learning to lipread unseen speakers based on visual-landmark transformers

F Xue, Y Li, D Liu, Y Xie, L Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Lipreading refers to understanding and further translating the speech of a video speaker into
textual outputs. State-of-the-art lipreading methods excel in interpreting overlap speakers, ie …

被引用次数：11 相关文章所有 4 个版本

[PDF] ieee.org

A survey of research on lipreading technology

M Hao, M Mamut, N Yadikar, A Aysa, K Ubul - IEEE Access, 2020 - ieeexplore.ieee.org

Although automatic speech recognition (ASR) technology is mature, there are still some
unsolved problems, such as how to accurately identify what the speaker is saying in a noisy …

被引用次数：37 相关文章所有 3 个版本

Review on research progress of machine lip reading

G Pu, H Wang - The Visual Computer, 2023 - Springer

Abstract Machine lip reading recognizes text content through the speaker's lip motion
information. Lip reading has significant research and application value. With the continuous …

被引用次数：9 相关文章所有 2 个版本

[PDF] smu.edu.sg

CATNet: Cross-modal fusion for audio–visual speech recognition

X Wang, J Mi, B Li, Y Zhao, J Meng - Pattern Recognition Letters, 2024 - Elsevier

Automatic speech recognition (ASR) is a typical pattern recognition technology that converts
human speeches into texts. With the aid of advanced deep learning models, the …

被引用次数：1 相关文章所有 4 个版本

[HTML] mdpi.com

[HTML][HTML] Research on a Lip Reading Algorithm Based on Efficient-GhostNet

G Zhang, Y Lu - Electronics, 2023 - mdpi.com

Lip reading technology refers to the analysis of the visual information of the speaker's mouth
movements to recognize the content of the speaker's speech. As one of the important …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

A Survey on Deep Multi-modal Learning for Body Language Recognition and Generation

L Liu, L Gao, W Lei, F Ma, X Lin, J Wang - arXiv preprint arXiv:2308.08849, 2023 - arxiv.org

Body language (BL) refers to the non-verbal communication expressed through physical
movements, gestures, facial expressions, and postures. It is a form of communication that …

被引用次数：1 相关文章所有 2 个版本

Multi-Scale Hybrid Fusion Network for Mandarin Audio-Visual Speech Recognition

J Wang, Z Guo, C Yang, X Li… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Compared to feature or decision fusion, hybrid fusion can beneficially improve audio-visual
speech recognition accuracy. Existing works are mainly prone to design the multi-modality …

被引用次数：2 相关文章所有 3 个版本