End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Distilling the knowledge of BERT for sequence-to-sequence ASR
Attention-based sequence-to-sequence (seq2seq) models have achieved promising results
in automatic speech recognition (ASR). However, as these models decode in a left-to-right …
in automatic speech recognition (ASR). However, as these models decode in a left-to-right …
Open source magicdata-ramc: A rich annotated mandarin conversational (ramc) speech dataset
This paper introduces a high-quality rich annotated Mandarin conversational (RAMC)
speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours …
speech dataset called MagicData-RAMC. The MagicData-RAMC corpus contains 180 hours …
Advanced long-context end-to-end speech recognition using context-expanded transformers
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
Hierarchical transformer-based large-context end-to-end asr with large-context knowledge distillation
R Masumura, N Makishima, M Ihori… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
We present a novel large-context end-to-end automatic speech recognition (E2E-ASR)
model and its effective training method based on knowledge distillation. Common E2E-ASR …
model and its effective training method based on knowledge distillation. Common E2E-ASR …
End-to-end automatic speech recognition integrated with CTC-based voice activity detection
This paper integrates a voice activity detection (VAD) function with end-to-end automatic
speech recognition toward an online speech interface and transcribing very long audio …
speech recognition toward an online speech interface and transcribing very long audio …
Advanced long-content speech recognition with factorized neural transducer
Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …
years, as it captures the relationship among consecutive historical utterances while …
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …
challenges, including extracting relevant contextual information from previous …
Context-aware end-to-end ASR using self-attentive embedding and tensor fusion
Typical automatic speech recognition (ASR) systems are built to recognize independent
utterances without using the cross-utterance context. However, the context over multiple …
utterances without using the cross-utterance context. However, the context over multiple …
[PDF][PDF] Transformer-Based Long-Context End-to-End Speech Recognition.
This paper presents an approach to long-context end-to-end automatic speech recognition
(ASR) using Transformers, aiming at improving ASR accuracy for long audio recordings …
(ASR) using Transformers, aiming at improving ASR accuracy for long audio recordings …