[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Large-scale asr domain adaptation using self-and semi-supervised learning
Self-and semi-supervised learning methods have been actively investigated to reduce
labeled training data or enhance model performance. However, these approaches mostly …
labeled training data or enhance model performance. However, these approaches mostly …
Pseudo label is better than human label
State-of-the-art automatic speech recognition (ASR) systems are trained with tens of
thousands of hours of labeled speech data. Human transcription is expensive and time …
thousands of hours of labeled speech data. Human transcription is expensive and time …
Improving the latency and quality of cascaded encoders
In this paper, we explore reducing computational latency of the 2-pass cascaded encoder
model [1]. Specifically, we experiment with reducing the size of the causal 1st-pass and …
model [1]. Specifically, we experiment with reducing the size of the causal 1st-pass and …
On addressing practical challenges for rnn-transducer
In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …
Asr and emotional speech: A word-level investigation of the mutual impact of speech and emotion recognition
In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to
address their inherent variability. However, the reliance on human annotated text in most …
address their inherent variability. However, the reliance on human annotated text in most …
Asr rescoring and confidence estimation with electra
In automatic speech recognition (ASR) rescoring, the hypothesis with the fewest errors
should be selected from the n-best list using a language model (LM). However, LMs are …
should be selected from the n-best list using a language model (LM). However, LMs are …
ETEH: Unified attention-based end-to-end ASR and KWS architecture
Even though attention-based end-to-end (E2E) automatic speech recognition (ASR) models
have been yielding state-of-the-art recognition accuracy, they still fall behind many of the …
have been yielding state-of-the-art recognition accuracy, they still fall behind many of the …
Residual energy-based models for end-to-end speech recognition
End-to-end models with auto-regressive decoders have shown impressive results for
automatic speech recognition (ASR). These models formulate the sequence-level probability …
automatic speech recognition (ASR). These models formulate the sequence-level probability …
Multi-task learning for end-to-end ASR word and utterance confidence with deletion prediction
Confidence scores are very useful for downstream applications of automatic speech
recognition (ASR) systems. Recent works have proposed using neural networks to learn …
recognition (ASR) systems. Recent works have proposed using neural networks to learn …