[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Streaming automatic speech recognition with the transformer model
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …
Attention-inspired artificial neural networks for speech processing: A systematic review
N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …
human brain and have been widely applied in speech processing. The application areas of …
How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications
J Zuluaga-Gomez, A Prasad… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
Spike-triggered non-autoregressive transformer for end-to-end speech recognition
Non-autoregressive transformer models have achieved extremely fast inference speed and
comparable performance with autoregressive sequence-to-sequence models in neural …
comparable performance with autoregressive sequence-to-sequence models in neural …
A new training pipeline for an improved neural transducer
The RNN transducer is a promising end-to-end model candidate. We compare the original
training criterion with the full marginalization over all alignments, to the commonly used …
training criterion with the full marginalization over all alignments, to the commonly used …
Injecting text in self-supervised speech pretraining
Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied
degrees of success. In this paper, we propose to jointly learn representations during …
degrees of success. In this paper, we propose to jointly learn representations during …
Advanced long-context end-to-end speech recognition using context-expanded transformers
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
Momentum pseudo-labeling: Semi-supervised asr with continuously improving pseudo-labels
End-to-end automatic speech recognition (ASR) has become a popular alternative to
traditional module-based systems, simplifying the model-building process with a single deep …
traditional module-based systems, simplifying the model-building process with a single deep …
Semi-supervised speech recognition via graph-based temporal classification
Semi-supervised learning has demonstrated promising results in automatic speech
recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for …
recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for …