Improving performance of end-to-end ASR on numeric sequences

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：338 相关文章所有 7 个版本

[PDF] arxiv.org

Deep shallow fusion for RNN-T personalization

D Le, G Keren, J Chan, J Mahadeokar… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

End-to-end models in general, and Recurrent Neural Network Transducer (RNN-T) in
particular, have gained significant traction in the automatic speech recognition community in …

被引用次数：74 相关文章所有 3 个版本

[PDF] arxiv.org

Deliberation model based two-pass end-to-end speech recognition

K Hu, TN Sainath, R Pang… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR)
and perform competitively relative to conventional models. To further improve the quality, a …

被引用次数：89 相关文章所有 7 个版本

[PDF] arxiv.org

Modular hybrid autoregressive transducer

Z Meng, T Chen, R Prabhavalkar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …

被引用次数：20 相关文章所有 3 个版本

[PDF] aclanthology.org

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org

The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

被引用次数：35 相关文章所有 9 个版本

[PDF] arxiv.org

Transformer based deliberation for two-pass speech recognition

K Hu, R Pang, TN Sainath… - 2021 IEEE Spoken …, 2021 - ieeexplore.ieee.org

Interactive speech recognition systems must generate words quickly while also producing
accurate results. Two-pass models excel at these requirements by employing a first-pass …

被引用次数：38 相关文章所有 5 个版本

[PDF] arxiv.org

Neural inverse text normalization

M Sunkara, C Shivade, S Bodapati… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org

While there have been several contributions exploring state of the art techniques for text
normalization, the problem of inverse text normalization (ITN) remains relatively unexplored …

被引用次数：38 相关文章所有 4 个版本

[PDF] arxiv.org

On addressing practical challenges for rnn-transducer

R Zhao, J Xue, J Li, W Wei, L He… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

In this paper, several works are proposed to address practi-cal challenges for deploying
RNN Transducer (RNN-T) based speech recognition systems. These challenges are …

被引用次数：27 相关文章所有 5 个版本

Alleviating asr long-tailed problem by decoupling the learning of representation and classification

K Deng, G Cheng, R Yang, Y Yan - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org

Recently, we have witnessed excellent improvement of end-to-end (E2E) automatic speech
recognition (ASR). However, how to tackle the long-tailed data distribution problem while …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Using speech synthesis to train end-to-end spoken language understanding models

L Lugosch, BH Meyer… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

End-to-end models are an attractive new approach to spoken language understanding
(SLU) in which the meaning of an utterance is inferred directly from the raw audio, without …

被引用次数：43 相关文章所有 3 个版本