End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
A survey on non-autoregressive generation for neural machine translation and beyond
Non-autoregressive (NAR) generation, which is first proposed in neural machine translation
(NMT) to speed up inference, has attracted much attention in both machine learning and …
(NMT) to speed up inference, has attracted much attention in both machine learning and …
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
Deliberation of streaming rnn-transducer by non-autoregressive decoding
We propose to deliberate the hypothesis alignment of a streaming RNN-T model with the
previously proposed Align-Refine non-autoregressive decoding method and its improved …
previously proposed Align-Refine non-autoregressive decoding method and its improved …
[PDF][PDF] Text-Only Domain Adaptation Based on Intermediate CTC.
H Sato, T Komori, T Mishima, Y Kawai, T Mochizuki… - Interspeech, 2022 - isca-archive.org
We propose a domain adaptation method that enables connectionist temporal classification
(CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a …
(CTC)-based end-to-end (E2E) automatic speech recognition (ASR) models to adapt to a …
Bectra: Transducer-based end-to-end asr with bert-enhanced encoder
We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech
recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …
recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder …
SFA: Searching faster architectures for end-to-end automatic speech recognition models
Recently End-to-end (E2E) Automatic Speech Recognition (ASR) has been widely used due
to its advantages over the hybrid method. Even though existing E2E ASR models have …
to its advantages over the hybrid method. Even though existing E2E ASR models have …
Non-autoregressive end-to-end automatic speech recognition incorporating downstream natural language processing
We propose a fast and accurate end-to-end (E2E) model, which executes automatic speech
recognition (ASR) and downstream natural language processing (NLP) simultaneously. The …
recognition (ASR) and downstream natural language processing (NLP) simultaneously. The …
Improving Streaming End-to-End ASR on Transformer-Based Causal Models With Encoder States Revision Strategies
There is often a trade-off between performance and latency in streaming automatic speech
recognition (ASR). Traditional methods such as look-ahead and chunk-based methods …
recognition (ASR). Traditional methods such as look-ahead and chunk-based methods …
LV-CTC: Non-autoregressive ASR with CTC and latent variable models
Non-autoregressive (NAR) models for automatic speech recognition (ASR) aim to achieve
high accuracy and fast inference by simplifying the autoregressive (AR) generation process …
high accuracy and fast inference by simplifying the autoregressive (AR) generation process …