Improved neural language model fusion for streaming recurrent neural network transducer

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org

How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

Internal language model training for domain-adaptive end-to-end speech recognition

Z Meng, N Kanda, Y Gaur… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …

被引用次数：51 相关文章所有 4 个版本

[PDF] acm.org

Modeling spoken information queries for virtual assistants: Open problems, challenges and opportunities

C Van Gysel - Proceedings of the 46th International ACM SIGIR …, 2023 - dl.acm.org

Virtual assistants are becoming increasingly important speech-driven Information Retrieval
platforms that assist users with various tasks. We discuss open problems and challenges …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Bayesian neural network language modeling for speech recognition

B Xue, S Hu, J Xu, M Geng, X Liu… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org

State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Semantic distance: A new metric for asr performance analysis towards spoken language understanding

S Kim, A Arora, D Le, CF Yeh, C Fuegen… - arXiv preprint arXiv …, 2021 - arxiv.org

Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Dissecting user-perceived latency of on-device E2E speech recognition

Y Shangguan, R Prabhavalkar, H Su… - arXiv preprint arXiv …, 2021 - arxiv.org

As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Adaptive contextual biasing for transducer based streaming speech recognition

T Xu, Z Yang, K Huang, P Guo, A Zhang, B Li… - arXiv preprint arXiv …, 2023 - arxiv.org

By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …

被引用次数：8 相关文章所有 5 个版本

[PDF] ieee.org

Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator

G Sun, C Zhang, PC Woodland - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org

Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …

被引用次数：12 相关文章所有 4 个版本