[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Deep shallow fusion for RNN-T personalization

D Le, G Keren, J Chan, J Mahadeokar… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in
particular, have gained significant traction in the automatic speech recognition community in …

Deliberation model based two-pass end-to-end speech recognition

K Hu, TN Sainath, R Pang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR)
and perform competitively relative to conventional models. To further improve the quality, a …

Modular hybrid autoregressive transducer

Z Meng, T Chen, R Prabhavalkar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

Transformer based deliberation for two-pass speech recognition

K Hu, R Pang, TN Sainath… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org
Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …

Neural inverse text normalization

M Sunkara, C Shivade, S Bodapati… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
While there have been several contributions exploring state of the art techniques for text
normalization, the problem of inverse text normalization (ITN) remains relatively unexplored …

On addressing practical challenges for rnn-transducer

R Zhao, J Xue, J Li, W Wei, L He… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …

Alleviating asr long-tailed problem by decoupling the learning of representation and classification

K Deng, G Cheng, R Yang, Y Yan - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech
recognition (ASR). However, how to tackle the long-tailed data distribution problem while …

Using speech synthesis to train end-to-end spoken language understanding models

L Lugosch, BH Meyer… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
End-to-end models are an attractive new approach to spoken language understanding
(SLU) in which the meaning of an utterance is inferred directly from the raw audio, without …