Streaming transformer transducer based speech recognition using non-causal convolution

G Saon, A Gupta, X Cui - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

We improve on the popular conformer architecture by replacing the depthwise temporal
convolutions with diagonal state space (DSS) models. DSS is a recently introduced variant …

被引用次数：27 相关文章所有 4 个版本

[PDF] acm.org

StegoType: Surface Typing from Egocentric Cameras

M Richardson, F Botros, Y Shi, P Guo… - Proceedings of the 37th …, 2024 - dl.acm.org

Text input is a critical component of any general purpose computing system, yet efficient and
natural text input remains a challenge in AR and VR. Headset based hand-tracking has …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Learning asr pathways: A sparse multilingual asr model

M Yang, A Tjandra, C Liu, D Zhang… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Neural network pruning compresses automatic speech recognition (ASR) models effectively.
However, in multilingual ASR, language-agnostic pruning may lead to severe performance …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

Learning a dual-mode speech recognition model via self-pruning

C Liu, Y Shangguan, H Yang, Y Shi… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

There is growing interest in unifying the streaming and full-context automatic speech
recognition (ASR) networks into a single end-to-end ASR model to simplify the model …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Deliberation model for on-device spoken language understanding

D Le, A Shrivastava, P Tomasello, S Kim… - arXiv preprint arXiv …, 2022 - arxiv.org

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language
understanding (SLU), where a streaming automatic speech recognition (ASR) model …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

S Kim, A Shrivastava, D Le, J Lin, O Kalinli… - arXiv preprint arXiv …, 2023 - arxiv.org

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic
parse from speech have become more promising recently. This approach uses a single …

被引用次数：3 相关文章所有 5 个版本

Joint Federated Learning and Personalization for on-Device ASR

J Jia, K Li, M Malek, K Malik… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In this paper, we propose a joint federated learning (FL) and personalization method for on-
device ASR adaptation. Starting with a Conformer-based RNN-T as the ASR model …

被引用次数：2 相关文章

[HTML] sciltp.com

[HTML][HTML] CCE-Net: Causal Convolution Embedding Network for Streaming Automatic Speech Recognition

F Deng, Y Ming, B Lyu - International Journal of Network Dynamics and …, 2024 - sciltp.com

Streaming Automatic Speech Recognition (ASR) has gained significant attention across
various application scenarios, including video conferencing, live sports events, and …

[PDF] arxiv.org

Ufo2: A unified pre-training framework for online and offline speech recognition

L Fu, S Li, Q Li, L Deng, F Li, L Fan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

In this paper, we propose a Unified pre-training Framework for Online and Offline (UFO2)
Automatic Speech Recognition (ASR), which 1) simplifies the two separate training …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Factorized blank thresholding for improved runtime efficiency of neural transducers

D Le, F Seide, Y Wang, Y Li, K Schubert… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

We show how factoring the RNN-T's output distribution can significantly reduce the
computation cost and power consumption for on-device ASR inference with no loss in …

被引用次数：5 相关文章所有 5 个版本