Advanced long-context end-to-end speech recognition using context-expanded transformers

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：326 相关文章所有 7 个版本

[PDF] academia.edu

End-to-end speech summarization using restricted self-attention

R Sharma, S Palaskar, AW Black… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speech summarization is typically performed by using a cascade of speech recognition and
text summarization models. End-to-end modeling of speech summarization models is …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Advanced long-content speech recognition with factorized neural transducer

X Gong, Y Wu, J Li, S Liu, R Zhao… - … /ACM Transactions on …, 2024 - ieeexplore.ieee.org

Long-content automatic speech recognition (ASR) has obtained increasing interest in recent
years, as it captures the relationship among consecutive historical utterances while …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Conversational Speech Recognition by Learning Audio-textual Cross-modal Contextual Representation

K Wei, B Li, H Lv, Q Lu, N Jiang… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Automatic Speech Recognition (ASR) in conversational settings presents unique
challenges, including extracting relevant contextual information from previous …

被引用次数：4 相关文章所有 3 个版本

Context-aware end-to-end ASR using self-attentive embedding and tensor fusion

SY Chang, C Zhang, TN Sainath, B Li… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Typical automatic speech recognition (ASR) systems are built to recognize independent
utterances without using the cross-utterance context. However, the context over multiple …

被引用次数：7 相关文章

[PDF] arxiv.org

Towards effective and compact contextual representation for conformer transducer speech recognition systems

M Cui, J Kang, J Deng, X Yin, Y Xie, X Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Current ASR systems are mainly trained and evaluated at the utterance level. Long range
cross utterance context can be incorporated. A key task is to derive a suitable compact …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Longfnt: Long-form speech recognition with factorized neural transducer

X Gong, Y Wu, J Li, S Liu, R Zhao… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Traditional automatic speech recognition (ASR) systems usually focus on individual
utterances, without considering long-form speech with useful historical information, which is …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Context-aware fine-tuning of self-supervised speech models

S Shon, F Wu, K Kim, P Sridhar… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Self-supervised pre-trained transformers have improved the state of the art on a variety of
speech tasks. Due to the quadratic time and space complexity of self-attention, they usually …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Leveraging acoustic contextual representation by audio-textual cross-modal learning for conversational asr

K Wei, Y Zhang, S Sun, L Xie, L Ma - arXiv preprint arXiv:2207.01039, 2022 - arxiv.org

Leveraging context information is an intuitive idea to improve performance on
conversational automatic speech recognition (ASR). Previous works usually adopt …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Bass: Block-wise adaptation for speech summarization

R Sharma, K Zheng, S Arora, S Watanabe… - arXiv preprint arXiv …, 2023 - arxiv.org

End-to-end speech summarization has been shown to improve performance over cascade
baselines. However, such models are difficult to train on very large inputs (dozens of …

被引用次数：4 相关文章所有 6 个版本