Alignment restricted streaming recurrent neural network transducer

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：325 相关文章所有 7 个版本

[PDF] arxiv.org

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org

How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

Towards measuring fairness in speech recognition: Casual conversations dataset transcriptions

C Liu, M Picheny, L Sarı, P Chitkara… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

The problem of machine learning systems demonstrating bias towards specific groups of
individuals has been studied extensively, particularly in the Facial Recognition area, but …

被引用次数：33 相关文章所有 5 个版本

[PDF] arxiv.org

A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies

F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Accelerating rnn-t training and inference using ctc guidance

Y Wang, Z Chen, C Zheng, Y Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

We propose a novel method to accelerate training and inference process of recurrent neural
network transducer (RNN-T) based on the guidance from a co-trained connectionist …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Scaling asr improves zero and few shot learning

A Xiao, W Zheng, G Keren, D Le, F Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org

With 4.5 million hours of English speech from 10 different sources across 120 countries and
models of up to 10 billion parameters, we explore the frontiers of scale for automatic speech …

被引用次数：21 相关文章所有 6 个版本

[PDF] researchgate.net

[PDF][PDF] Time-synchronous one-pass beam search for parallel online and offline transducers with dynamic block training

Y Sudo, M Shakeel, Y Peng… - Proc. INTERSPEECH …, 2023 - researchgate.net

End-to-end automatic speech recognition (ASR) has become an increasingly popular area
of research, with two main models being online and offline ASR. Online models aim to …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Semantic distance: A new metric for asr performance analysis towards spoken language understanding

S Kim, A Arora, D Le, CF Yeh, C Fuegen… - arXiv preprint arXiv …, 2021 - arxiv.org

Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Dissecting user-perceived latency of on-device E2E speech recognition

Y Shangguan, R Prabhavalkar, H Su… - arXiv preprint arXiv …, 2021 - arxiv.org

As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Robust acoustic and semantic contextual biasing in neural transducers for speech recognition

X Fu, KM Sathyendra, A Gandhe, J Liu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Attention-based contextual biasing approaches have shown significant improvements in the
recognition of generic and/or personal rare-words in End-to-End Automatic Speech …

被引用次数：12 相关文章所有 9 个版本