[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
E-branchformer: Branchformer with enhanced merging for speech recognition
Conformer, combining convolution and self-attention sequentially to capture both local and
global information, has shown remarkable performance and is currently regarded as the …
global information, has shown remarkable performance and is currently regarded as the …
Prompting large language models for zero-shot domain adaptation in speech recognition
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …
domain shifts in speech recognition. However, these approaches usually require a …
Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion
How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …
remained an active research area. Previous solutions to this problem were either designed …
Internal language model training for domain-adaptive end-to-end speech recognition
The efficacy of external language model (LM) integration with existing end-to-end (E2E)
automatic speech recognition (ASR) systems can be improved significantly using the …
automatic speech recognition (ASR) systems can be improved significantly using the …
Can generative large language models perform asr error correction?
ASR error correction continues to serve as an important part of post-processing for speech
recognition systems. Traditionally, these models are trained with supervised training using …
recognition systems. Traditionally, these models are trained with supervised training using …
Synthasr: Unlocking synthetic data for speech recognition
End-to-end (E2E) automatic speech recognition (ASR) models have recently demonstrated
superior performance over the traditional hybrid ASR models. Training an E2E ASR model …
superior performance over the traditional hybrid ASR models. Training an E2E ASR model …
Modular hybrid autoregressive transducer
Text-only adaptation of a transducer model remains challenging for end-to-end speech
recognition since the transducer has no clearly separated acoustic model (AM), language …
recognition since the transducer has no clearly separated acoustic model (AM), language …
Investigating methods to improve language model integration for attention-based encoder-decoder ASR models
Attention-based encoder-decoder (AED) models learn an implicit internal language model
(ILM) from the training transcriptions. The integration with an external LM trained on much …
(ILM) from the training transcriptions. The integration with an external LM trained on much …