Bigssl: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition
We summarize the results of a host of efforts using giant automatic speech recognition (ASR)
models pre-trained using large, diverse unlabeled datasets containing approximately a …
models pre-trained using large, diverse unlabeled datasets containing approximately a …
Joist: A joint speech and text streaming model for asr
We present JOIST, an algorithm to train a streaming, cascaded, encoder end-to-end (E2E)
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
model with both speech-text paired inputs, and text-only unpaired inputs. Unlike previous …
Diagonal state space augmented transformers for speech recognition
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …
Modular hybrid autoregressive transducer
Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …
recognition since the transducer has no clearly separated acoustic model (AM), language …
Nam+: Towards scalable end-to-end contextual biasing for adaptive asr
Attention-based biasing techniques for end-to-end ASR systems are able to achieve large
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …
accuracy gains without requiring the inference algorithm adjustments and parameter tuning …
Modular domain adaptation for conformer-based streaming asr
Speech data from different domains has distinct acoustic and linguistic characteristics. It is
common to train a single multidomain model such as a Conformer transducer for speech …
common to train a single multidomain model such as a Conformer transducer for speech …
A unified cascaded encoder asr model for dynamic model sizes
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition
(ASR) model, which unifies models for different deployment scenarios. Moreover, the model …
(ASR) model, which unifies models for different deployment scenarios. Moreover, the model …
Learning a dual-mode speech recognition model via self-pruning
There is growing interest in unifying the streaming and full-context automatic speech
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …
Improving deliberation by text-only and semi-supervised training
Text-only and semi-supervised training based on audio-only data has gained popularity
recently due to the wide availability of unlabeled text and speech data. In this work, we …
recently due to the wide availability of unlabeled text and speech data. In this work, we …
Sub-8-bit quantization for on-device speech recognition: A regularization-free approach
For on-device automatic speech recognition (ASR), quantization aware training (QAT) is
ubiquitous to achieve the trade-off between model predictive performance and efficiency …
ubiquitous to achieve the trade-off between model predictive performance and efficiency …