Improving ctc-based speech recognition via knowledge transferring from pre-trained language models

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier

The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

被引用次数：100 相关文章所有 6 个版本

[PDF] arxiv.org

Deep transfer learning for automatic speech recognition: Towards better generalization

H Kheddar, Y Himeur, S Al-Maadeed, A Amira… - Knowledge-Based …, 2023 - Elsevier

Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …

被引用次数：42 相关文章所有 5 个版本

[PDF] arxiv.org

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org

Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

被引用次数：51 相关文章所有 8 个版本

[PDF] thecvf.com

XVO: Generalized visual odometry via cross-modal self-training

L Lai, Z Shangguan, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Label-synchronous neural transducer for end-to-end ASR

K Deng, PC Woodland - arXiv preprint arXiv:2307.03088, 2023 - arxiv.org

Neural transducers provide a natural approach to streaming ASR. However, they augment
output sequences with blank tokens which leads to challenges for domain adaptation using …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Knowledge transfer from pre-trained language models to cif-based speech recognizers via hierarchical distillation

M Han, F Chen, J Shi, S Xu, B Xu - arXiv preprint arXiv:2301.13003, 2023 - arxiv.org

Large-scale pre-trained language models (PLMs) have shown great potential in natural
language processing tasks. Leveraging the capabilities of PLMs to enhance automatic …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

A context-aware knowledge transferring strategy for CTC-based ASR

KH Lu, KY Chen - 2022 IEEE Spoken Language Technology …, 2023 - ieeexplore.ieee.org

Non-autoregressive automatic speech recognition (ASR) modeling has received increasing
attention recently because of its fast decoding speed and superior performance. Among …

被引用次数：9 相关文章所有 3 个版本

[PDF] isca-archive.org

[PDF][PDF] Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.

S Tian, K Deng, Z Li, L Ye, G Cheng, T Li, Y Yan - Interspeech, 2022 - isca-archive.org

Recently, end-to-end ASR models based on connectionist temporal classification (CTC)
have achieved impressive results, but their performance is limited in lightweight models …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Y Yang, H Xu, H Huang, ES Chng… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more
unpaired text data by multi-modal training, one needs to address two problems: 1) the …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modal Alignment with Optimal Transport for CTC-based ASR

X Lu, P Shen, Y Tsao, H Kawai - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org

Temporal connectionist temporal classification (CTC)-based automatic speech recognition
(ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the …

被引用次数：1 相关文章所有 5 个版本