Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …
remained an active research area. Previous solutions to this problem were either designed …
Internal language model training for domain-adaptive end-to-end speech recognition
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …
automatic speech recognition (ASR) systems can be improved significantly using the …
Modeling spoken information queries for virtual assistants: Open problems, challenges and opportunities
C Van Gysel - Proceedings of the 46th International ACM SIGIR …, 2023 - dl.acm.org
Virtual assistants are becoming increasingly important speech-driven Information Retrieval
platforms that assist users with various tasks. We discuss open problems and challenges …
platforms that assist users with various tasks. We discuss open problems and challenges …
Bayesian neural network language modeling for speech recognition
State-of-the-art neural network language models (NNLMs) represented by long short term
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …
memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly …
Tree-constrained pointer generator for end-to-end contextual speech recognition
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
Contextualized end-to-end speech recognition with contextual phrase prediction network
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …
incorporating it into the end-to-end speech recognition models has drawn immense interest …
Semantic distance: A new metric for asr performance analysis towards spoken language understanding
Word Error Rate (WER) has been the predominant metric used to evaluate the performance
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
of automatic speech recognition (ASR) systems. However, WER is sometimes not a good …
Dissecting user-perceived latency of on-device E2E speech recognition
As speech-enabled devices such as smartphones and smart speakers become increasingly
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
ubiquitous, there is growing interest in building automatic speech recognition (ASR) systems …
Adaptive contextual biasing for transducer based streaming speech recognition
By incorporating additional contextual information, deep biasing methods have emerged as
a promising solution for speech recognition of personalized words. However, for real-world …
a promising solution for speech recognition of personalized words. However, for real-world …
Minimising biasing word errors for contextual ASR with the tree-constrained pointer generator
Contextual knowledge is essential for reducing speech recognition errors on high-valued
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …
long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) …