Diagonal state space augmented transformers for speech recognition

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

StegoType: Surface Typing from Egocentric Cameras

M Richardson, F Botros, Y Shi, P Guo… - Proceedings of the 37th …, 2024 - dl.acm.org
Text input is a critical component of any general purpose computing system, yet efficient and
natural text input remains a challenge in AR and VR. Headset based hand-tracking has …

Learning asr pathways: A sparse multilingual asr model

M Yang, A Tjandra, C Liu, D Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Neural network pruning compresses automatic speech recognition (ASR) models effectively.
However, in multilingual ASR, language-agnostic pruning may lead to severe performance …

Learning a dual-mode speech recognition model via self-pruning

C Liu, Y Shangguan, H Yang, Y Shi… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
There is growing interest in unifying the streaming and full-context automatic speech
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …

Deliberation model for on-device spoken language understanding

D Le, A Shrivastava, P Tomasello, S Kim… - arXiv preprint arXiv …, 2022 - arxiv.org
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

Joint Federated Learning and Personalization for on-Device ASR

J Jia, K Li, M Malek, K Malik… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
In this paper, we propose a joint federated learning (FL) and personalization method for on-
device ASR adaptation. Starting with a Conformer-based RNN-T as the ASR model …

[HTML][HTML] CCE-Net: Causal Convolution Embedding Network for Streaming Automatic Speech Recognition

F Deng, Y Ming, B Lyu - International Journal of Network Dynamics and …, 2024 - sciltp.com
Streaming Automatic Speech Recognition (ASR) has gained significant attention across
various application scenarios, including video conferencing, live sports events, and …

Ufo2: A unified pre-training framework for online and offline speech recognition

L Fu, S Li, Q Li, L Deng, F Li, L Fan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2)
Automatic Speech Recognition (ASR), which 1) simplifies the two separate training …

Factorized blank thresholding for improved runtime efficiency of neural transducers

D Le, F Seide, Y Wang, Y Li, K Schubert… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We show how factoring the RNN-T's output distribution can significantly reduce the
computation cost and power consumption for on-device ASR inference with no loss in …