[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Transformers in speech processing: A survey
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …
sparked the interest of the speech-processing community, leading to an exploration of their …
A study of transformer-based end-to-end speech recognition system for Kazakh language
Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …
attention, has been widely used in the field of speech recognition. The great advantage of …
Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT
Attention-based encoder-decoder (AED) models have achieved promising performance in
speech recognition. However, because the decoder predicts text tokens (such as characters …
speech recognition. However, because the decoder predicts text tokens (such as characters …
Advanced long-context end-to-end speech recognition using context-expanded transformers
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …
End-to-end speech summarization using restricted self-attention
Speech summarization is typically performed by using a cascade of speech recognition and
text summarization models. End-to-end modeling of speech summarization models is …
text summarization models. End-to-end modeling of speech summarization models is …
Advanced long-content speech recognition with factorized neural transducer
Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …
years, as it captures the relationship among consecutive historical utterances while …
Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation
Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …
challenges, including extracting relevant contextual information from previous …
Towards effective and compact contextual representation for conformer transducer speech recognition systems
Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …
cross utterance context can be incorporated. A key task is to derive a suitable compact …
AI‐based language tutoring systems with end‐to‐end automatic speech recognition and proficiency evaluation
This paper presents the development of language tutoring systems for non‐native speakers
by leveraging advanced end‐to‐end automatic speech recognition (ASR) and proficiency …
by leveraging advanced end‐to‐end automatic speech recognition (ASR) and proficiency …