A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
A practical survey on faster and lighter transformers
Q Fournier, GM Caron, D Aloise - ACM Computing Surveys, 2023 - dl.acm.org
Recurrent neural networks are effective models to process sequences. However, they are
unable to learn long-term dependencies because of their inherent sequential nature. As a …
unable to learn long-term dependencies because of their inherent sequential nature. As a …
Visual speech recognition for multiple languages in the wild
Visual speech recognition (VSR) aims to recognize the content of speech based on lip
movements, without relying on the audio stream. Advances in deep learning and the …
movements, without relying on the audio stream. Advances in deep learning and the …
Squeezeformer: An efficient transformer for automatic speech recognition
The recently proposed Conformer model has become the de facto backbone model for
various downstream speech tasks based on its hybrid attention-convolution architecture that …
various downstream speech tasks based on its hybrid attention-convolution architecture that …
Intermediate loss regularization for ctc-based speech recognition
J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …
remained an active research area. Previous solutions to this problem were either designed …
Deep shallow fusion for RNN-T personalization
End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in
particular, have gained significant traction in the automatic speech recognition community in …
particular, have gained significant traction in the automatic speech recognition community in …
Towards measuring fairness in speech recognition: Casual conversations dataset transcriptions
The problem of machine learning systems demonstrating bias towards specific groups of
individuals has been studied extensively, particularly in the Facial Recognition area, but …
individuals has been studied extensively, particularly in the Facial Recognition area, but …
A study of transducer based end-to-end ASR with ESPnet: Architecture, auxiliary loss and decoding strategies
F Boyer, Y Shinohara, T Ishii… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this study, we present recent developments of models trained with the RNN-T loss in
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …
ESPnet. It involves the use of various archi-tectures such as recently proposed Conformer …
Multi-head state space model for speech recognition
State space models (SSMs) have recently shown promising results on small-scale sequence
and language modelling tasks, rivalling and outperforming many attention-based …
and language modelling tasks, rivalling and outperforming many attention-based …