Context-aware transformer transducer for speech recognition

FJ Chang, J Liu, M Radfar, A Mouchtaris… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty
recognizing uncommon words, that appear infrequently in the training data. One promising …

Contextual RNN-T for open domain ASR

M Jain, G Keren, J Mahadeokar, G Zweig… - arXiv preprint arXiv …, 2020 - arxiv.org
End-to-end (E2E) systems for automatic speech recognition (ASR), such as RNN
Transducer (RNN-T) and Listen-Attend-Spell (LAS) blend the individual components of a …

Dual application of speech enhancement for automatic speech recognition

A Pandey, C Liu, Y Wang… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
In this work, we exploit speech enhancement for improving a re-current neural network
transducer (RNN-T) based ASR system. We employ a dense convolutional recurrent …

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

Tree-constrained pointer generator with graph neural network encodings for contextual speech recognition

G Sun, C Zhang, PC Woodland - arXiv preprint arXiv:2207.00857, 2022 - arxiv.org
Incorporating biasing words obtained as contextual knowledge is critical for many automatic
speech recognition (ASR) applications. This paper proposes the use of graph neural …

Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

Benchmarking lf-mmi, ctc and rnn-t criteria for streaming asr

X Zhang, F Zhang, C Liu, K Schubert… - 2021 IEEE spoken …, 2021 - ieeexplore.ieee.org
In this work, to measure the accuracy and efficiency for a latency-controlled streaming
automatic speech recognition (ASR) application, we perform comprehensive evaluations on …

[HTML][HTML] Two-step joint optimization with auxiliary loss function for noise-robust speech recognition

GW Lee, HK Kim - Sensors, 2022 - mdpi.com
In this paper, a new two-step joint optimization approach based on the asynchronous
subregion optimization method is proposed for training a pipeline model composed of two …

Graph neural networks for contextual ASR with the tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Incorporating biasing words obtained through contextual knowledge is paramount in
automatic speech recognition (ASR) applications. This paper proposes an innovative …