[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
A comprehensive survey on applications of transformers for deep learning tasks
S Islam, H Elmekki, A Elsebai, J Bentahar… - Expert Systems with …, 2023 - Elsevier
Abstract Transformers are Deep Neural Networks (DNN) that utilize a self-attention
mechanism to capture contextual relationships within sequential data. Unlike traditional …
mechanism to capture contextual relationships within sequential data. Unlike traditional …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Developing real-time streaming transformer transducer for speech recognition on large-scale dataset
Recently, Transformer based end-to-end models have achieved great success in many
areas including speech recognition. However, compared to LSTM models, the heavy …
areas including speech recognition. However, compared to LSTM models, the heavy …
Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition
This paper proposes an efficient memory transformer Emformer for low latency streaming
speech recognition. In Emformer, the long-range history context is distilled into an …
speech recognition. In Emformer, the long-range history context is distilled into an …
[HTML][HTML] Audio-visual speech and gesture recognition by sensors of mobile devices
Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …
speech recognition, particularly when audio is corrupted by noise. Additional visual …
Transformers in speech processing: A survey
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …
sparked the interest of the speech-processing community, leading to an exploration of their …
Data movement is all you need: A case study on optimizing transformers
Transformers are one of the most important machine learning workloads today. Training one
is a very compute-intensive task, often taking days or weeks, and significant attention has …
is a very compute-intensive task, often taking days or weeks, and significant attention has …
Advancing RNN transducer technology for speech recognition
We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …
lowering the word error rate on three different tasks (Switchboard 300 hours, conversational …