[PDF][PDF] A Comparison of sequence-to-sequence models for speech recognition.
In this work, we conduct a detailed evaluation of various allneural, end-to-end trained,
sequence-to-sequence models applied to the task of speech recognition. Notably, each of …
sequence-to-sequence models applied to the task of speech recognition. Notably, each of …
[PDF][PDF] Recurrent neural aligner: An encoder-decoder neural network model for sequence to sequence mapping.
We introduce an encoder-decoder recurrent neural network model called Recurrent Neural
Aligner (RNA) that can be used for sequence to sequence mapping tasks. Like connectionist …
Aligner (RNA) that can be used for sequence to sequence mapping tasks. Like connectionist …
Exploring architectures, data and units for streaming end-to-end speech recognition with rnn-transducer
We investigate training end-to-end speech recognition models with the recurrent neural
network transducer (RNN-T): a streaming, all-neural, sequence-to-sequence architecture …
network transducer (RNN-T): a streaming, all-neural, sequence-to-sequence architecture …
Multi-accent speech recognition with hierarchical grapheme based models
We train grapheme-based acoustic models for speech recognition using a hierarchical
recurrent neural network architecture with connectionist temporal classification (CTC) loss …
recurrent neural network architecture with connectionist temporal classification (CTC) loss …
Direct acoustics-to-word models for english conversational speech recognition
Recent work on end-to-end automatic speech recognition (ASR) has shown that the
connectionist temporal classification (CTC) loss can be used to convert acoustics to phone …
connectionist temporal classification (CTC) loss can be used to convert acoustics to phone …
Towards end-to-end speech recognition with recurrent neural networks
This paper presents a speech recognition system that directly transcribes audio data with
text, without requiring an intermediate phonetic representation. The system is based on a …
text, without requiring an intermediate phonetic representation. The system is based on a …
A comparison of modeling units in sequence-to-sequence speech recognition with the transformer on mandarin chinese
S Zhou, L Dong, S Xu, B Xu - International Conference on Neural …, 2018 - Springer
The choice of modeling units is critical to automatic speech recognition (ASR) tasks.
Conventional ASR systems typically choose context-dependent states (CD-states) or context …
Conventional ASR systems typically choose context-dependent states (CD-states) or context …
[PDF][PDF] Lower Frame Rate Neural Network Acoustic Models.
G Pundak, TN Sainath - Interspeech, 2016 - isca-archive.org
Recently neural network acoustic models trained with Connectionist Temporal Classification
(CTC) were proposed as an alternative approach to conventional cross-entropy trained …
(CTC) were proposed as an alternative approach to conventional cross-entropy trained …
An empirical exploration of CTC acoustic models
The connectionist temporal classification (CTC) loss function has several interesting
properties relevant for automatic speech recognition (ASR): applied on top of deep recurrent …
properties relevant for automatic speech recognition (ASR): applied on top of deep recurrent …
[PDF][PDF] Recurrent neural network and LSTM models for lexical utterance classification.
SV Ravuri, A Stolcke - Interspeech, 2015 - isca-archive.org
Utterance classification is a critical pre-processing step for many speech understanding and
dialog systems. In multi-user settings, one needs to first identify if an utterance is even …
dialog systems. In multi-user settings, one needs to first identify if an utterance is even …