Slm: Bridge the thin gap between speech and text foundation models
We present a joint Speech and Language Model (SLM), a multitask, multilingual, and dual-
modal model that takes advantage of pretrained foundational speech and language models …
modal model that takes advantage of pretrained foundational speech and language models …
Improving contextual biasing with text injection
In this work, we present a model-based approach to improving contextual biasing that
improves quality without drastically increasing model computation during inference …
improves quality without drastically increasing model computation during inference …
[PDF][PDF] Dual-mode NAM: Effective top-k context injection for end-to-end asr
ASR systems in real applications must be adapted on the fly to correctly recognize task-
specific contextual terms, such as contacts, application names and media entities. However …
specific contextual terms, such as contacts, application names and media entities. However …
Contextual Spelling Correction with Large Language Models
Contextual Spelling Correction (CSC) models are used to improve automatic speech
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …
recognition (ASR) quality given userspecific context. Typically, context is modeled as a large …
Exploration of Adapter for Noise Robust Automatic Speech Recognition
H Shi, T Kawahara - arXiv preprint arXiv:2402.18275, 2024 - arxiv.org
Adapting a robust automatic speech recognition (ASR) system to tackle unseen noise
scenarios is crucial. Integrating adapters into neural networks has emerged as a potent …
scenarios is crucial. Integrating adapters into neural networks has emerged as a potent …
Spellmapper: A non-autoregressive neural spellchecker for asr customization with candidate retrieval based on n-gram mappings
A Antonova, E Bakhturina, B Ginsburg - arXiv preprint arXiv:2306.02317, 2023 - arxiv.org
Contextual spelling correction models are an alternative to shallow fusion to improve
automatic speech recognition (ASR) quality given user vocabulary. To deal with large user …
automatic speech recognition (ASR) quality given user vocabulary. To deal with large user …
Deferred NAM: Low-latency Top-K Context Injection via DeferredContext Encoding for Non-Streaming ASR
Contextual biasing enables speech recognizers to transcribe important phrases in the
speaker's context, such as contact names, even if they are rare in, or absent from, the …
speaker's context, such as contact names, even if they are rare in, or absent from, the …
Contextual biasing with the Knuth-Morris-Pratt matching algorithm
Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR)
systems towards rare entities that are relevant to the specific user or application scenarios …
systems towards rare entities that are relevant to the specific user or application scenarios …
Speech-enriched Memory for Inference-time Adaptation of ASR Models to Word Dictionaries
Despite the impressive performance of ASR models on mainstream benchmarks, their
performance on rare words is unsatisfactory. In enterprise settings, often a focused list of …
performance on rare words is unsatisfactory. In enterprise settings, often a focused list of …
[PDF][PDF] Optimizing Large-Scale Context Retrieval for End-to-End ASR
Abstract Contextual Automatic Speech Recognition (ASR) requires scalable and accurate
retrieval of content relevant to the user's context. This paper presents a comparative study of …
retrieval of content relevant to the user's context. This paper presents a comparative study of …