[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Towards measuring fairness in speech recognition: Casual conversations dataset transcriptions

C Liu, M Picheny, L Sarı, P Chitkara… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
The problem of machine learning systems demonstrating bias towards specific groups of
individuals has been studied extensively, particularly in the Facial Recognition area, but …

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

Accelerating rnn-t training and inference using ctc guidance

Y Wang, Z Chen, C Zheng, Y Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We propose a novel method to accelerate training and inference process of recurrent neural
network transducer (RNN-T) based on the guidance from a co-trained connectionist …

Scaling asr improves zero and few shot learning

A Xiao, W Zheng, G Keren, D Le, F Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org
With 4.5 million hours of English speech from 10 different sources across 120 countries and
models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech …

[PDF][PDF] Time-synchronous one-pass beam search for parallel online and offline transducers with dynamic block training

Y Sudo, M Shakeel, Y Peng… - Proc. INTERSPEECH …, 2023 - researchgate.net
End-to-end automatic speech recognition (ASR) has become an increasingly popular area
of research, with two main models being online and offline ASR. Online models aim to …

Semantic distance: A new metric for asr performance analysis towards spoken language understanding

S Kim, A Arora, D Le, CF Yeh, C Fuegen… - arXiv preprint arXiv …, 2021 - arxiv.org
Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …

Dissecting user-perceived latency of on-device E2E speech recognition

Y Shangguan, R Prabhavalkar, H Su… - arXiv preprint arXiv …, 2021 - arxiv.org
As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …

Robust acoustic and semantic contextual biasing in neural transducers for speech recognition

X Fu, KM Sathyendra, A Gandhe, J Liu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …