Context-aware transformer transducer for speech recognition
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty
recognizing uncommon words, that appear infrequently in the training data. One promising …
recognizing uncommon words, that appear infrequently in the training data. One promising …
Contextual RNN-T for open domain ASR
End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …
Dual application of speech enhancement for automatic speech recognition
In this work, we exploit speech enhancement for improving a re-current neural network
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …
Tree-constrained pointer generator for end-to-end contextual speech recognition
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
Tree-constrained pointer generator with graph neural network encodings for contextual speech recognition
Incorporating biasing words obtained as contextual knowledge is critical for many automatic
speech recognition (ASR) applications. This paper proposes the use of graph neural …
speech recognition (ASR) applications. This paper proposes the use of graph neural …
Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator
Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …
Towards effective and compact contextual representation for conformer transducer speech recognition systems
Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …
cross utterance context can be incorporated. A key task is to derive a suitable compact …
Benchmarking lf-mmi, ctc and rnn-t criteria for streaming asr
In this work, to measure the accuracy and efficiency for a latency-controlled streaming
automatic speech recognition (ASR) application, we perform comprehensive evaluations on …
automatic speech recognition (ASR) application, we perform comprehensive evaluations on …
[HTML][HTML] Two-step joint optimization with auxiliary loss function for noise-robust speech recognition
In this paper, a new two-step joint optimization approach based on the asynchronous
subregion optimization method is proposed for training a pipeline model composed of two …
subregion optimization method is proposed for training a pipeline model composed of two …
Graph neural networks for contextual ASR with the tree-constrained pointer generator
Incorporating biasing words obtained through contextual knowledge is paramount in
automatic speech recognition (ASR) applications. This paper proposes an innovative …
automatic speech recognition (ASR) applications. This paper proposes an innovative …