Exploring the integration of large language models into automatic speech recognition systems: An empirical study
This paper explores the integration of Large Language Models (LLMs) into Automatic
Speech Recognition (ASR) systems to improve transcription accuracy. The increasing …
Speech Recognition (ASR) systems to improve transcription accuracy. The increasing …
Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer
Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …
contextual entities, which is essential in practical applications, especially for streaming …
Contextual Spelling Correction with Large Language Models
Contextual Spelling Correction (CSC) models are used to improve automatic speech
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …
Server-side rescoring of spoken entity-centric knowledge queries for virtual assistants
Y Zhang, S Gondala, T Fraga-Silva… - International Journal of …, 2024 - Springer
On-device virtual assistants (VAs) powered by automatic speech recognition (ASR) require
effective knowledge integration for the challenging entity-rich query recognition. In this …
effective knowledge integration for the challenging entity-rich query recognition. In this …
Integrating lattice-free MMI into end-to-end speech recognition
In automatic speech recognition (ASR) research, discriminative criteria have achieved
superior performance in DNN-HMM systems. Given this success, the adoption of …
superior performance in DNN-HMM systems. Given this success, the adoption of …
O-1: Self-training with Oracle and 1-best Hypothesis
We introduce O-1, a new self-training objective to reduce training bias and unify training and
evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum …
evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum …
Effective internal language model training and fusion for factorized transducer model
The internal language model (ILM) of the neural transducer has been widely studied. In most
prior work, it is mainly used for estimating the ILM score and is subsequently subtracted …
prior work, it is mainly used for estimating the ILM score and is subsequently subtracted …
Correction Focused Language Model Training For Speech Recognition
Language models (LMs) have been commonly adopted to boost the performance of
automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional …
automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional …
Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition
Homophone characters are common in tonal syllable-based languages, such as Mandarin
and Cantonese. The data-intensive end-to-end Automatic Speech Recognition (ASR) …
and Cantonese. The data-intensive end-to-end Automatic Speech Recognition (ASR) …
Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices
L Velikovich, C Li, D Caseiro, S Kumar… - arXiv preprint arXiv …, 2024 - arxiv.org
For end-to-end Automatic Speech Recognition (ASR) models, recognizing personal or rare
phrases can be hard. A promising way to improve accuracy is through spelling correction (or …
phrases can be hard. A promising way to improve accuracy is through spelling correction (or …