End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

A survey on non-autoregressive generation for neural machine translation and beyond

Y Xiao, L Wu, J Guo, J Li, M Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

Deliberation of streaming rnn-transducer by non-autoregressive decoding

W Wang, K Hu, TN Sainath - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
We propose to deliberate the hypothesis alignment of a streaming RNN-T model with the
previously proposed Align-Refine non-autoregressive decoding method and its improved …

[PDF][PDF] Text-Only Domain Adaptation Based on Intermediate CTC.

H Sato, T Komori, T Mishima, Y Kawai, T Mochizuki… - Interspeech, 2022 - isca-archive.org
We propose a domain adaptation method that enables connectionist temporal classification
(CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a …

Bectra: Transducer-based end-to-end asr with bert-enhanced encoder

Y Higuchi, T Ogawa, T Kobayashi… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech
recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …

SFA: Searching faster architectures for end-to-end automatic speech recognition models

Y Liu, T Li, P Zhang, Y Yan - Computer Speech & Language, 2023 - Elsevier
Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due
to its advantages over the hybrid method. Even though existing E2E ASR models have …

Non-autoregressive end-to-end automatic speech recognition incorporating downstream natural language processing

M Omachi, Y Fujita, S Watanabe… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech
recognition (ASR) and downstream natural language processing (NLP) simultaneously. The …

Improving Streaming End-to-End ASR on Transformer-Based Causal Models With Encoder States Revision Strategies

Z Li, H Miao, K Deng, G Cheng, S Tian, T Li… - arXiv preprint arXiv …, 2022 - arxiv.org
There is often a trade-off between performance and latency in streaming automatic speech
recognition (ASR). Traditional methods such as look-ahead and chunk-based methods …

LV-CTC: Non-autoregressive ASR with CTC and latent variable models

Y Fujita, S Watanabe, X Chang… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Non-autoregressive (NAR) models for automatic speech recognition (ASR) aim to achieve
high accuracy and fast inference by simplifying the autoregressive (AR) generation process …