Transformer-Based Long-Context End-to-End Speech Recognition.

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] arxiv.org

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org

The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

被引用次数：39 相关文章所有 4 个版本

[PDF] nature.com

A study of transformer-based end-to-end speech recognition system for Kazakh language

M Orken, O Dina, A Keylan, T Tolganay, O Mohamed - Scientific reports, 2022 - nature.com

Today, the Transformer model, which allows parallelization and also has its own internal
attention, has been widely used in the field of speech recognition. The great advantage of …

被引用次数：30 相关文章所有 7 个版本

[PDF] arxiv.org

Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT

Y Bai, J Yi, J Tao, Z Tian, Z Wen… - IEEE/ACM Transactions …, 2021 - ieeexplore.ieee.org

Attention-based encoder-decoder (AED) models have achieved promising performance in
speech recognition. However, because the decoder predicts text tokens (such as characters …

被引用次数：57 相关文章所有 4 个版本

[PDF] arxiv.org

Advanced long-context end-to-end speech recognition using context-expanded transformers

T Hori, N Moritz, C Hori, JL Roux - arXiv preprint arXiv:2104.09426, 2021 - arxiv.org

This paper addresses end-to-end automatic speech recognition (ASR) for long audio
recordings such as lecture and conversational speeches. Most end-to-end ASR models are …

被引用次数：36 相关文章所有 6 个版本

[PDF] academia.edu

End-to-end speech summarization using restricted self-attention

R Sharma, S Palaskar, AW Black… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speech summarization is typically performed by using a cascade of speech recognition and
text summarization models. End-to-end modeling of speech summarization models is …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Advanced long-content speech recognition with factorized neural transducer

X Gong, Y Wu, J Li, S Liu, R Zhao… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org

Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

被引用次数：5 相关文章所有 5 个版本

[PDF] wiley.com Full View

AI‐based language tutoring systems with end‐to‐end automatic speech recognition and proficiency evaluation

BO Kang, HB Jeon, YK Lee - ETRI Journal, 2024 - Wiley Online Library

This paper presents the development of language tutoring systems for non‐native speakers
by leveraging advanced end‐to‐end automatic speech recognition (ASR) and proficiency …

被引用次数：4 相关文章所有 3 个版本