Audio-visual speech recognition using deep learning

K Noda, Y Yamaguchi, K Nakadai, HG Okuno… - Applied intelligence, 2015 - Springer
Audio-visual speech recognition (AVSR) system is thought to be one of the most promising
solutions for reliable speech recognition, particularly when the audio is corrupted by noise …

Weakly supervised learning with multi-stream CNN-LSTM-HMMs to discover sequential parallelism in sign language videos

O Koller, NC Camgoz, H Ney… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
In this work we present a new approach to the field of weakly supervised learning in the
video domain. Our method is relevant to sequence learning problems which can be split up …

Method and configuration for determining a descriptive feature of a speech signal

M Holzapfel - US Patent 6,523,005, 2003 - Google Patents
A method and also a configuration for determining a descriptive feature of a speech signal,
in which a first speech model is trained with a first time pattern and a second speech model …

Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR

A Hagen, A Morris - Computer Speech & Language, 2005 - Elsevier
In this article we review several successful extensions to the standard hidden-Markov-
model/artificial neural network (HMM/ANN) hybrid, which have recently made important …

Subband-based speech recognition

H Bourlard, S Dupont - 1997 IEEE International Conference on …, 1997 - ieeexplore.ieee.org
In the framework of hidden Markov models (HMM) or hybrid HMM/artificial neural network
(ANN) systems, we present a new approach towards automatic speech recognition (ASR) …

A multiobjective learning and ensembling approach to high-performance speech enhancement with compact neural network architectures

Q Wang, J Du, LR Dai, CH Lee - IEEE/ACM Transactions on …, 2018 - ieeexplore.ieee.org
In this study, we propose a novel deep neural network (DNN) architecture for speech
enhancement (SE) via a multiobjective learning and ensembling (MOLE) framework to …

Some solution to the missing feature problem in data classification, with application to noise robust ASR

AC Morris, MP Cooke, PD Green - Proceedings of the 1998 …, 1998 - ieeexplore.ieee.org
We address the theoretical and practical issues involved in automatic speech recognition
(ASR) when some of the observation data for the target signal is masked by other signals …

[PDF][PDF] Opportunities and challenges of parallelizing speech recognition

J Chong, G Friedland, A Janin, N Morgan… - Proceedings of the 2nd …, 2010 - usenix.org
Automatic speech recognition enables a wide range of current and emerging applications
such as automatic transcription, multimedia content analysis, and natural human-computer …

An HMM-based framework for video semantic analysis

G Xu, YF Ma, HJ Zhang, SQ Yang - IEEE Transactions on …, 2005 - ieeexplore.ieee.org
Video semantic analysis is essential in video indexing and structuring. However, due to the
lack of robust and generic algorithms, most of the existing works on semantic analysis are …

An articulatory feature-based tandem approach and factored observation modeling

O Cetin, A Kantor, S King, C Bartels… - … , Speech and Signal …, 2007 - ieeexplore.ieee.org
The so-called tandem approach, where the posteriors of a multilayer perceptron (MLP)
classifier are used as features in an automatic speech recognition (ASR) system has proven …