Lipreading with long short-term memory

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：163 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] An overview on data representation learning: From traditional feature learning to recent deep learning

G Zhong, LN Wang, X Ling, J Dong - The Journal of Finance and Data …, 2016 - Elsevier

Since about 100 years ago, to learn the intrinsic structure of data, many representation
learning approaches have been proposed, either linear or nonlinear, either supervised or …

被引用次数：250 相关文章所有 5 个版本

[PDF] arxiv.org

Deep audio-visual speech recognition

T Afouras, JS Chung, A Senior… - IEEE transactions on …, 2018 - ieeexplore.ieee.org

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

被引用次数：821 相关文章所有 15 个版本

[PDF] arxiv.org

Lipreading using temporal convolutional networks

B Martinez, P Ma, S Petridis… - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org

Lip-reading has attracted a lot of research attention lately thanks to advances in deep
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …

被引用次数：262 相关文章所有 3 个版本

[PDF] ox.ac.uk

Lip reading in the wild

JS Chung, A Zisserman - Computer Vision–ACCV 2016: 13th Asian …, 2017 - Springer

Our aim is to recognise the words being spoken by a talking face, given only the video but
not the audio. Existing works in this area have focussed on trying to recognise a small …

被引用次数：799 相关文章所有 6 个版本

[PDF] thecvf.com

Lip reading sentences in the wild

J Son Chung, A Senior, O Vinyals… - Proceedings of the …, 2017 - openaccess.thecvf.com

The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …

被引用次数：918 相关文章所有 20 个版本

[PDF] neurips.cc

Phased lstm: Accelerating recurrent network training for long or event-based sequences

D Neil, M Pfeiffer, SC Liu - Advances in neural information …, 2016 - proceedings.neurips.cc

Abstract Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for
extracting patterns from temporal sequences. Current RNN models are ill suited to process …

被引用次数：531 相关文章所有 11 个版本

[PDF] arxiv.org

Lipnet: End-to-end sentence-level lipreading

YM Assael, B Shillingford, S Whiteson… - arXiv preprint arXiv …, 2016 - arxiv.org

Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …

被引用次数：453 相关文章所有 6 个版本

[PDF] mdpi.com

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com

Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …

被引用次数：44 相关文章所有 9 个版本

[PDF] arxiv.org

End-to-end audiovisual speech recognition

S Petridis, T Stafylakis, P Ma, F Cai… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org

Several end-to-end deep learning approaches have been recently presented which extract
either audio or visual features from the input images or audio signals and perform speech …

被引用次数：309 相关文章所有 12 个版本