[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

A study of transformer-based end-to-end speech recognition system for Kazakh language

M Orken, O Dina, A Keylan, T Tolganay, O Mohamed - Scientific reports, 2022 - nature.com
Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …

Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT

Y Bai, J Yi, J Tao, Z Tian, Z Wen… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org
Attention-based encoder-decoder (AED) models have achieved promising performance in
speech recognition. However, because the decoder predicts text tokens (such as characters …

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org
This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

End-to-end speech summarization using restricted self-attention

R Sharma, S Palaskar, AW Black… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech summarization is typically performed by using a cascade of speech recognition and
text summarization models. End-to-end modeling of speech summarization models is …

Advanced long-content speech recognition with factorized neural transducer

X Gong, Y Wu, J Li, S Liu, R Zhao… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org
Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

AI‐based language tutoring systems with end‐to‐end automatic speech recognition and proficiency evaluation

BO Kang, HB Jeon, YK Lee - ETRI Journal, 2024 - Wiley Online Library
This paper presents the development of language tutoring systems for non‐native speakers
by leveraging advanced end‐to‐end automatic speech recognition (ASR) and proficiency …