A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Deep transfer learning for automatic speech recognition: Towards better generalization

H Kheddar, Y Himeur, S Al-Maadeed, A Amira… - Knowledge-Based …, 2023 - Elsevier
Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

XVO: Generalized visual odometry via cross-modal self-training

L Lai, Z Shangguan, J Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …

Label-synchronous neural transducer for end-to-end ASR

K Deng, PC Woodland - arXiv preprint arXiv:2307.03088, 2023 - arxiv.org
Neural transducers provide a natural approach to streaming ASR. However, they augment
output sequences with blank tokens which leads to challenges for domain adaptation using …

Knowledge transfer from pre-trained language models to cif-based speech recognizers via hierarchical distillation

M Han, F Chen, J Shi, S Xu, B Xu - arXiv preprint arXiv:2301.13003, 2023 - arxiv.org
Large-scale pre-trained language models (PLMs) have shown great potential in natural
language processing tasks. Leveraging the capabilities of PLMs to enhance automatic …

A context-aware knowledge transferring strategy for CTC-based ASR

KH Lu, KY Chen - 2022 IEEE Spoken Language Technology …, 2023 - ieeexplore.ieee.org
Non-autoregressive automatic speech recognition (ASR) modeling has received increasing
attention recently because of its fast decoding speed and superior performance. Among …

[PDF][PDF] Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.

S Tian, K Deng, Z Li, L Ye, G Cheng, T Li, Y Yan - Interspeech, 2022 - isca-archive.org
Recently, end-to-end ASR models based on connectionist temporal classification (CTC)
have achieved impressive results, but their performance is limited in lightweight models …

Speech-text based multi-modal training with bidirectional attention for improved speech recognition

Y Yang, H Xu, H Huang, ES Chng… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more
unpaired text data by multi-modal training, one needs to address two problems: 1) the …

Cross-modal Alignment with Optimal Transport for CTC-based ASR

X Lu, P Shen, Y Tsao, H Kawai - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Temporal connectionist temporal classification (CTC)-based automatic speech recognition
(ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the …