A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Deep transfer learning for automatic speech recognition: Towards better generalization
Automatic speech recognition (ASR) has recently become an important challenge when
using deep learning (DL). It requires large-scale training datasets and high computational …
using deep learning (DL). It requires large-scale training datasets and high computational …
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
XVO: Generalized visual odometry via cross-modal self-training
We propose XVO, a semi-supervised learning method for training generalized monocular
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …
Visual Odometry (VO) models with robust off-the-self operation across diverse datasets and …
Label-synchronous neural transducer for end-to-end ASR
K Deng, PC Woodland - arXiv preprint arXiv:2307.03088, 2023 - arxiv.org
Neural transducers provide a natural approach to streaming ASR. However, they augment
output sequences with blank tokens which leads to challenges for domain adaptation using …
output sequences with blank tokens which leads to challenges for domain adaptation using …
Knowledge transfer from pre-trained language models to cif-based speech recognizers via hierarchical distillation
Large-scale pre-trained language models (PLMs) have shown great potential in natural
language processing tasks. Leveraging the capabilities of PLMs to enhance automatic …
language processing tasks. Leveraging the capabilities of PLMs to enhance automatic …
A context-aware knowledge transferring strategy for CTC-based ASR
Non-autoregressive automatic speech recognition (ASR) modeling has received increasing
attention recently because of its fast decoding speed and superior performance. Among …
attention recently because of its fast decoding speed and superior performance. Among …
[PDF][PDF] Knowledge Distillation For CTC-based Speech Recognition Via Consistent Acoustic Representation Learning.
Recently, end-to-end ASR models based on connectionist temporal classification (CTC)
have achieved impressive results, but their performance is limited in lightweight models …
have achieved impressive results, but their performance is limited in lightweight models …
Speech-text based multi-modal training with bidirectional attention for improved speech recognition
To let the state-of-the-art end-to-end ASR model enjoy data efficiency, as well as much more
unpaired text data by multi-modal training, one needs to address two problems: 1) the …
unpaired text data by multi-modal training, one needs to address two problems: 1) the …
Cross-modal Alignment with Optimal Transport for CTC-based ASR
Temporal connectionist temporal classification (CTC)-based automatic speech recognition
(ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the …
(ASR) is one of the most successful end to end (E2E) ASR frameworks. However, due to the …