[PDF][PDF] Performance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRU

A Shewalkar, D Nyavanandi, SA Ludwig - Journal of Artificial …, 2019 - sciendo.com
Abstract Deep Neural Networks (DNN) are nothing but neural networks with many hidden
layers. DNNs are becoming popular in automatic speech recognition tasks which combines …

[PDF][PDF] Semi-orthogonal low-rank matrix factorization for deep neural networks.

D Povey, G Cheng, Y Wang, K Li, H Xu… - Interspeech, 2018 - academia.edu
Abstract Time Delay Neural Networks (TDNNs), also known as onedimensional
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …

[PDF][PDF] Purely sequence-trained neural networks for ASR based on lattice-free MMI.

D Povey, V Peddinti, D Galvez, P Ghahremani… - Interspeech, 2016 - isca-archive.org
In this paper we describe a method to perform sequencediscriminative training of neural
network acoustic models without the need for frame-level cross-entropy pre-training. We use …

[PDF][PDF] A time delay neural network architecture for efficient modeling of long temporal contexts.

V Peddinti, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org
Recurrent neural network architectures have been shown to efficiently model long term
temporal dependencies between acoustic events. However the training time of recurrent …

A pruned rnnlm lattice-rescoring algorithm for automatic speech recognition

H Xu, T Chen, D Gao, Y Wang, K Li… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
Lattice-rescoring is a common approach to take advantage of recurrent neural language
models in ASR, where a word-lattice is generated from 1st-pass decoding and the lattice is …

Jhu aspire system: Robust lvcsr with tdnns, ivector adaptation and rnn-lms

V Peddinti, G Chen, V Manohar, T Ko… - … IEEE Workshop on …, 2015 - ieeexplore.ieee.org
Multi-style training, using data which emulates a variety of possible test scenarios, is a
popular approach towards robust acoustic modeling. However acoustic models capable of …

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

SJ Chen, AS Subramanian, H Xu… - arXiv preprint arXiv …, 2018 - arxiv.org
This paper describes a new baseline system for automatic speech recognition (ASR) in the
CHiME-4 challenge to promote the development of noisy ASR in speech processing …

Neural network language modeling with letter-based features and importance sampling

H Xu, K Li, Y Wang, J Wang, S Kang… - … on acoustics, speech …, 2018 - ieeexplore.ieee.org
In this paper we describe an extension of the Kaldi software toolkit to support neural-based
language modeling, intended for use in automatic speech recognition (ASR) and related …

Wake word detection with streaming transformers

Y Wang, H Lv, D Povey, L Xie… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Modern wake word detection systems usually rely on neural networks for acoustic modeling.
Transformers has recently shown superior performance over LSTM and convolutional …

[PDF][PDF] Recurrent neural network language model adaptation for conversational speech recognition.

K Li, H Xu, Y Wang, D Povey, S Khudanpur - Interspeech, 2018 - danielpovey.com
We propose two adaptation models for recurrent neural network language models
(RNNLMs) to capture topic effects and longdistance triggers for conversational automatic …