[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Google usm: Scaling automatic speech recognition beyond 100 languages
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
Understanding the role of self attention for efficient speech recognition
Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …
Conformer with dual-mode chunked attention for joint online and offline asr
F Weninger, M Gaudesi, MA Haidar, N Ferri… - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we present an in-depth study on online attention mechanisms and distillation
techniques for dual-mode (ie, joint online and offline) ASR using the Conformer Transducer …
techniques for dual-mode (ie, joint online and offline) ASR using the Conformer Transducer …
Variable attention masking for configurable transformer transducer speech recognition
This work studies the use of attention masking in transformer transducer based speech
recognition for building a single configurable model for different deployment scenarios. We …
recognition for building a single configurable model for different deployment scenarios. We …
Lookahead when it matters: Adaptive non-causal transformers for streaming neural transducers
Streaming speech recognition architectures are employed for low-latency, real-time
applications. Such architectures are often characterized by their causality. Causal …
applications. Such architectures are often characterized by their causality. Causal …
Automatic Speech Recognition Design Modeling
K Babu Rao, B Mopuru, M Jawarneh… - Conversational …, 2024 - Wiley Online Library
The term “automatic speech recognition” refers to the procedure by which an auditory signal
of spoken words can be converted into text. Voice recognition is another term that may be …
of spoken words can be converted into text. Voice recognition is another term that may be …
Modular Conformer Training for Flexible End-to-End ASR
K Audhkhasi, B Farris, B Ramabhadran… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The state-of-the-art conformer used in automatic speech recognition combines feed-forward,
convolution and multi-headed self-attention layers in a single model that is trained end-to …
convolution and multi-headed self-attention layers in a single model that is trained end-to …
[HTML][HTML] A preliminary study on associated learning for ASR
In this paper, we propose the first successful implementation of associated learning (AL) to
automatic speech recognition (ASR). AL has been shown to provide better label noise …
automatic speech recognition (ASR). AL has been shown to provide better label noise …
Mixture model attention for flexible streaming and non-streaming automatic speech recognition
A method for an automated speech recognition (ASR) model for unifying streaming and non-
streaming speech recognition including receiving a sequence of acoustic frames. The …
streaming speech recognition including receiving a sequence of acoustic frames. The …