Exploring the integration of large language models into automatic speech recognition systems: An empirical study

Z Min, J Wang - International Conference on Neural Information …, 2023 - Springer
This paper explores the integration of Large Language Models (LLMs) into Automatic
Speech Recognition (ASR) systems to improve transcription accuracy. The increasing …

Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer

J Qiu, L Huang, B Li, J Zhang, L Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …

Contextual Spelling Correction with Large Language Models

G Song, Z Wu, G Pundak, A Chandorkar… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Contextual Spelling Correction (CSC) models are used to improve automatic speech
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …

Server-side rescoring of spoken entity-centric knowledge queries for virtual assistants

Y Zhang, S Gondala, T Fraga-Silva… - International Journal of …, 2024 - Springer
On-device virtual assistants (VAs) powered by automatic speech recognition (ASR) require
effective knowledge integration for the challenging entity-rich query recognition. In this …

Integrating lattice-free MMI into end-to-end speech recognition

J Tian, J Yu, C Weng, Y Zou… - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
In automatic speech recognition (ASR) research, discriminative criteria have achieved
superior performance in DNN-HMM systems. Given this success, the adoption of …

O-1: Self-training with Oracle and 1-best Hypothesis

MK Baskar, A Rosenberg, B Ramabhadran… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce O-1, a new self-training objective to reduce training bias and unify training and
evaluation metrics for speech recognition. O-1 is a faster variant of Expected Minimum …

Effective internal language model training and fusion for factorized transducer model

J Guo, N Moritz, Y Ma, F Seide, C Wu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The internal language model (ILM) of the neural transducer has been widely studied. In most
prior work, it is mainly used for estimating the ILM score and is subsequently subtracted …

Correction Focused Language Model Training For Speech Recognition

Y Ma, Z Liu, O Kalinli - ICASSP 2024-2024 IEEE International …, 2024 - ieeexplore.ieee.org
Language models (LMs) have been commonly adopted to boost the performance of
automatic speech recognition (ASR) particularly in domain adaptation tasks. Conventional …

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

HL Chung, J Li, P Liu, WK Leung… - … on Chinese Spoken …, 2022 - ieeexplore.ieee.org
Homophone characters are common in tonal syllable-based languages, such as Mandarin
and Cantonese. The data-intensive end-to-end Automatic Speech Recognition (ASR) …

Spelling Correction through Rewriting of Non-Autoregressive ASR Lattices

L Velikovich, C Li, D Caseiro, S Kumar… - arXiv preprint arXiv …, 2024 - arxiv.org
For end-to-end Automatic Speech Recognition (ASR) models, recognizing personal or rare
phrases can be hard. A promising way to improve accuracy is through spelling correction (or …