[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Adaptation algorithms for neural network-based speech recognition: An overview

P Bell, J Fainberg, O Klejch, J Li… - IEEE Open Journal …, 2020 - ieeexplore.ieee.org
We present a structured overview of adaptation algorithms for neural network-based speech
recognition, considering both hybrid hidden Markov model/neural network systems and end …

Wenet 2.0: More productive end-to-end speech recognition toolkit

B Zhang, D Wu, Z Peng, X Song, Z Yao, H Lv… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, we made available WeNet, a production-oriented end-to-end speech recognition
toolkit, which introduces a unified two-pass (U2) framework and a built-in runtime to address …

Contextual adapters for personalized speech recognition in neural transducers

KM Sathyendra, T Muniyappa, FJ Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …

Context-aware transformer transducer for speech recognition

FJ Chang, J Liu, M Radfar, A Mouchtaris… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
End-to-end (E2E) automatic speech recognition (ASR) systems often have difficulty
recognizing uncommon words, that appear infrequently in the training data. One promising …

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Deep shallow fusion for RNN-T personalization

D Le, G Keren, J Chan, J Mahadeokar… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in
particular, have gained significant traction in the automatic speech recognition community in …

Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection

M Han, L Dong, Z Liang, M Cai, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …

Personalization of ctc speech recognition models

S Dingliwal, M Sunkara, S Ronanki… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
End-to-end speech recognition models trained using joint Connectionist Temporal
Classification (CTC)-Attention loss have gained popularity recently. In these models, a non …

Nam+: Towards scalable end-to-end contextual biasing for adaptive asr

T Munkhdalai, Z Wu, G Pundak, KC Sim… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Attention-based biasing techniques for end-to-end ASR systems are able to achieve large
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …