Deep audio-visual learning: A survey
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …
modalities, has drawn considerable attention since deep learning started to be used …
[HTML][HTML] An overview on data representation learning: From traditional feature learning to recent deep learning
Since about 100 years ago, to learn the intrinsic structure of data, many representation
learning approaches have been proposed, either linear or nonlinear, either supervised or …
learning approaches have been proposed, either linear or nonlinear, either supervised or …
Deep audio-visual speech recognition
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …
with or without the audio. Unlike previous works that have focussed on recognising a limited …
Lipreading using temporal convolutional networks
Lip-reading has attracted a lot of research attention lately thanks to advances in deep
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …
learning. The current state-of-the-art model for recognition of isolated words in-the-wild …
Lip reading in the wild
JS Chung, A Zisserman - Computer Vision–ACCV 2016: 13th Asian …, 2017 - Springer
Our aim is to recognise the words being spoken by a talking face, given only the video but
not the audio. Existing works in this area have focussed on trying to recognise a small …
not the audio. Existing works in this area have focussed on trying to recognise a small …
Lip reading sentences in the wild
The goal of this work is to recognise phrases and sentences being spoken by a talking face,
with or without the audio. Unlike previous works that have focussed on recognising a limited …
with or without the audio. Unlike previous works that have focussed on recognising a limited …
Phased lstm: Accelerating recurrent network training for long or event-based sequences
Abstract Recurrent Neural Networks (RNNs) have become the state-of-the-art choice for
extracting patterns from temporal sequences. Current RNN models are ill suited to process …
extracting patterns from temporal sequences. Current RNN models are ill suited to process …
Lipnet: End-to-end sentence-level lipreading
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional
approaches separated the problem into two stages: designing or learning visual features …
approaches separated the problem into two stages: designing or learning visual features …
Audio-visual speech and gesture recognition by sensors of mobile devices
Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …
speech recognition, particularly when audio is corrupted by noise. Additional visual …
End-to-end audiovisual speech recognition
Several end-to-end deep learning approaches have been recently presented which extract
either audio or visual features from the input images or audio signals and perform speech …
either audio or visual features from the input images or audio signals and perform speech …