[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Deep shallow fusion for RNN-T personalization
End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in
particular, have gained significant traction in the automatic speech recognition community in …
particular, have gained significant traction in the automatic speech recognition community in …
Deliberation model based two-pass end-to-end speech recognition
End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR)
and perform competitively relative to conventional models. To further improve the quality, a …
and perform competitively relative to conventional models. To further improve the quality, a …
Modular hybrid autoregressive transducer
Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …
recognition since the transducer has no clearly separated acoustic model (AM), language …
How might we create better benchmarks for speech recognition?
A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …
due to recent significant quality improvements. However, as recent work indicates, even …
Transformer based deliberation for two-pass speech recognition
Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …
accurate results. Two-pass models excel at these requirements by employing a first-pass …
Neural inverse text normalization
While there have been several contributions exploring state of the art techniques for text
normalization, the problem of inverse text normalization (ITN) remains relatively unexplored …
normalization, the problem of inverse text normalization (ITN) remains relatively unexplored …
On addressing practical challenges for rnn-transducer
In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …
Alleviating asr long-tailed problem by decoupling the learning of representation and classification
Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech
recognition (ASR). However, how to tackle the long-tailed data distribution problem while …
recognition (ASR). However, how to tackle the long-tailed data distribution problem while …
Using speech synthesis to train end-to-end spoken language understanding models
End-to-end models are an attractive new approach to spoken language understanding
(SLU) in which the meaning of an utterance is inferred directly from the raw audio, without …
(SLU) in which the meaning of an utterance is inferred directly from the raw audio, without …