[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Streaming automatic speech recognition with the transformer model

N Moritz, T Hori, J Le - ICASSP 2020-2020 IEEE International …, 2020 - ieeexplore.ieee.org
Encoder-decoder based sequence-to-sequence models have demonstrated state-of-the-art
results in end-to-end automatic speech recognition (ASR). Recently, the transformer …

Attention-inspired artificial neural networks for speech processing: A systematic review

N Zacarias-Morales, P Pancardo… - Symmetry, 2021 - mdpi.com
Artificial Neural Networks (ANNs) were created inspired by the neural networks in the
human brain and have been widely applied in speech processing. The application areas of …

How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications

J Zuluaga-Gomez, A Prasad… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …

Spike-triggered non-autoregressive transformer for end-to-end speech recognition

Z Tian, J Yi, J Tao, Y Bai, S Zhang, Z Wen - arXiv preprint arXiv …, 2020 - arxiv.org
Non-autoregressive transformer models have achieved extremely fast inference speed and
comparable performance with autoregressive sequence-to-sequence models in neural …

A new training pipeline for an improved neural transducer

A Zeyer, A Merboldt, R Schlüter, H Ney - arXiv preprint arXiv:2005.09319, 2020 - arxiv.org
The RNN transducer is a promising end-to-end model candidate. We compare the original
training criterion with the full marginalization over all alignments, to the commonly used …

Injecting text in self-supervised speech pretraining

Z Chen, Y Zhang, A Rosenberg… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Self-supervised pretraining for Automated Speech Recognition (ASR) has shown varied
degrees of success. In this paper, we propose to jointly learn representations during …

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

Momentum pseudo-labeling: Semi-supervised asr with continuously improving pseudo-labels

Y Higuchi, N Moritz, J Le Roux… - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org
End-to-end automatic speech recognition (ASR) has become a popular alternative to
traditional module-based systems, simplifying the model-building process with a single deep …

Semi-supervised speech recognition via graph-based temporal classification

N Moritz, T Hori, J Le Roux - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org
Semi-supervised learning has demonstrated promising results in automatic speech
recognition (ASR) by self-training using a seed ASR model with pseudo-labels generated for …