[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Acoustic modeling based on deep learning for low-resource speech recognition: An overview
C Yu, M Kang, Y Chen, J Wu, X Zhao - IEEE Access, 2020 - ieeexplore.ieee.org
The polarization of world languages is becoming more and more obvious. Many languages,
mainly endangered languages, are of low-resource attribute due to lack of information. Both …
mainly endangered languages, are of low-resource attribute due to lack of information. Both …
Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling
Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new
direction in speech research. The approach benefits by performing model training without …
direction in speech research. The approach benefits by performing model training without …
Deep lip reading: a comparison of models and an online application
The goal of this paper is to develop state-of-the-art models for lip reading--visual speech
recognition. We develop three architectures and compare their accuracy and training …
recognition. We develop three architectures and compare their accuracy and training …
Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition
Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …
researchers exploring cross-lingual translation techniques such as machine translation and …
Attention-based end-to-end models for small-footprint keyword spotting
In this paper, we propose an attention-based end-to-end neural approach for small-footprint
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …
Streaming small-footprint keyword spotting using sequence-to-sequence models
We develop streaming keyword spotting systems using a recurrent neural network
transducer (RNN-T) model: an all-neural, end-to-end trained, sequence-to-sequence model …
transducer (RNN-T) model: an all-neural, end-to-end trained, sequence-to-sequence model …
Seeing wake words: Audio-visual keyword spotting
The goal of this work is to automatically determine whether and when a word of interest is
spoken by a talking face, with or without the audio. We propose a zero-shot method suitable …
spoken by a talking face, with or without the audio. We propose a zero-shot method suitable …
End-to-end speech recognition from federated acoustic models
Training Automatic Speech Recognition (ASR) models under federated learning (FL)
settings has attracted a lot of attention recently. However, the FL scenarios often presented …
settings has attracted a lot of attention recently. However, the FL scenarios often presented …
Language-agnostic multilingual modeling
A Datta, B Ramabhadran, J Emond… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of
data-rich and data-scarce languages in a single model. This enables data and parameter …
data-rich and data-scarce languages in a single model. This enables data and parameter …