[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
[PDF][PDF] Time-synchronous one-pass beam search for parallel online and offline transducers with dynamic block training
End-to-end automatic speech recognition (ASR) has become an increasingly popular area
of research, with two main models being online and offline ASR. Online models aim to …
of research, with two main models being online and offline ASR. Online models aim to …
A unified cascaded encoder asr model for dynamic model sizes
In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition
(ASR) model, which unifies models for different deployment scenarios. Moreover, the model …
(ASR) model, which unifies models for different deployment scenarios. Moreover, the model …
I3D: Transformer architectures with input-dependent dynamic depth for speech recognition
Transformer-based end-to-end speech recognition has achieved great success. However,
the large footprint and computational overhead make it difficult to deploy these models in …
the large footprint and computational overhead make it difficult to deploy these models in …
Streaming parallel transducer beam search with fast-slow cascaded encoders
Streaming ASR with strict latency constraints is required in many speech recognition
applications. In order to achieve the required latency, streaming ASR models sacrifice …
applications. In order to achieve the required latency, streaming ASR models sacrifice …
Compute cost amortized transformer for streaming asr
We present a streaming, Transformer-based end-to-end automatic speech recognition
(ASR) architecture which achieves efficient neural inference through compute cost …
(ASR) architecture which achieves efficient neural inference through compute cost …
Gated contextual adapters for selective contextual biasing in neural transducers
A Alexandridis, KM Sathyendra… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Neural contextual biasing for end-to-end neural ASR transducers has shown significant
improvements in the recognition of named entities, such as contact names or device names …
improvements in the recognition of named entities, such as contact names or device names …
Label-Synchronous Neural Transducer for Adaptable Online E2E Speech Recognition
K Deng, PC Woodland - IEEE/ACM Transactions on Audio …, 2024 - ieeexplore.ieee.org
Although end-to-end (E2E) automatic speech recognition (ASR) has shown state-of-the-art
recognition accuracy, it tends to be implicitly biased towards the training data distribution …
recognition accuracy, it tends to be implicitly biased towards the training data distribution …
Caching networks: Capitalizing on common speech for asr
A Alexandridis, GP Strimel, A Rastrow… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We introduce Caching Networks (CachingNets), a speech recognition network architecture
capable of delivering faster, more accurate decoding by leveraging common speech …
capable of delivering faster, more accurate decoding by leveraging common speech …
Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition
Varying-size models are often required to deploy ASR systems under different hardware
and/or application constraints such as memory and latency. To avoid redundant training and …
and/or application constraints such as memory and latency. To avoid redundant training and …