[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

E-branchformer: Branchformer with enhanced merging for speech recognition

K Kim, F Wu, Y Peng, J Pan, P Sridhar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …

Prompting large language models for zero-shot domain adaptation in speech recognition

Y Li, Y Wu, J Li, S Liu - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

Internal language model training for domain-adaptive end-to-end speech recognition

Z Meng, N Kanda, Y Gaur… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …

Can generative large language models perform asr error correction?

R Ma, M Qian, P Manakul, M Gales, K Knill - arXiv preprint arXiv …, 2023 - arxiv.org
ASR error correction continues to serve as an important part of post-processing for speech
recognition systems. Traditionally, these models are trained with supervised training using …

Synthasr: Unlocking synthetic data for speech recognition

A Fazel, W Yang, Y Liu, R Barra-Chicote… - arXiv preprint arXiv …, 2021 - arxiv.org
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated
superior performance over the traditional hybrid ASR models. Training an E2E ASR model …

Modular hybrid autoregressive transducer

Z Meng, T Chen, R Prabhavalkar… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org
Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …

Investigating methods to improve language model integration for attention-based encoder-decoder ASR models

M Zeineldeen, A Glushko, W Michel, A Zeyer… - arXiv preprint arXiv …, 2021 - arxiv.org
Attention-based encoder-decoder (AED) models learn an implicit internal language model
(ILM) from the training transcriptions. The integration with an external LM trained on much …