Diagonal state space augmented transformers for speech recognition
We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …
StegoType: Surface Typing from Egocentric Cameras
M Richardson, F Botros, Y Shi, P Guo… - Proceedings of the 37th …, 2024 - dl.acm.org
Text input is a critical component of any general purpose computing system, yet efficient and
natural text input remains a challenge in AR and VR. Headset based hand-tracking has …
natural text input remains a challenge in AR and VR. Headset based hand-tracking has …
Learning asr pathways: A sparse multilingual asr model
Neural network pruning compresses automatic speech recognition (ASR) models effectively.
However, in multilingual ASR, language-agnostic pruning may lead to severe performance …
However, in multilingual ASR, language-agnostic pruning may lead to severe performance …
Learning a dual-mode speech recognition model via self-pruning
There is growing interest in unifying the streaming and full-context automatic speech
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …
Deliberation model for on-device spoken language understanding
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …
understanding (SLU), where a streaming automatic speech recognition (ASR) model …
Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …
parse from speech have become more promising recently. This approach uses a single …
Joint Federated Learning and Personalization for on-Device ASR
In this paper, we propose a joint federated learning (FL) and personalization method for on-
device ASR adaptation. Starting with a Conformer-based RNN-T as the ASR model …
device ASR adaptation. Starting with a Conformer-based RNN-T as the ASR model …
[HTML][HTML] CCE-Net: Causal Convolution Embedding Network for Streaming Automatic Speech Recognition
F Deng, Y Ming, B Lyu - International Journal of Network Dynamics and …, 2024 - sciltp.com
Streaming Automatic Speech Recognition (ASR) has gained significant attention across
various application scenarios, including video conferencing, live sports events, and …
various application scenarios, including video conferencing, live sports events, and …
Ufo2: A unified pre-training framework for online and offline speech recognition
L Fu, S Li, Q Li, L Deng, F Li, L Fan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2)
Automatic Speech Recognition (ASR), which 1) simplifies the two separate training …
Automatic Speech Recognition (ASR), which 1) simplifies the two separate training …
Factorized blank thresholding for improved runtime efficiency of neural transducers
We show how factoring the RNN-T's output distribution can significantly reduce the
computation cost and power consumption for on-device ASR inference with no loss in …
computation cost and power consumption for on-device ASR inference with no loss in …