[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …
remained an active research area. Previous solutions to this problem were either designed …
Towards measuring fairness in speech recognition: Casual conversations dataset transcriptions
The problem of machine learning systems demonstrating bias towards specific groups of
individuals has been studied extensively, particularly in the Facial Recognition area, but …
individuals has been studied extensively, particularly in the Facial Recognition area, but …
A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies
F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …
Accelerating rnn-t training and inference using ctc guidance
We propose a novel method to accelerate training and inference process of recurrent neural
network transducer (RNN-T) based on the guidance from a co-trained connectionist …
network transducer (RNN-T) based on the guidance from a co-trained connectionist …
Scaling asr improves zero and few shot learning
With 4.5 million hours of English speech from 10 different sources across 120 countries and
models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech …
models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech …
[PDF][PDF] Time-synchronous one-pass beam search for parallel online and offline transducers with dynamic block training
End-to-end automatic speech recognition (ASR) has become an increasingly popular area
of research, with two main models being online and offline ASR. Online models aim to …
of research, with two main models being online and offline ASR. Online models aim to …
Semantic distance: A new metric for asr performance analysis towards spoken language understanding
Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
Dissecting user-perceived latency of on-device E2E speech recognition
As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
Robust acoustic and semantic contextual biasing in neural transducers for speech recognition
Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …