[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

Contextual adapters for personalized speech recognition in neural transducers

KM Sathyendra, T Muniyappa, FJ Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …

Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection

M Han, L Dong, Z Liang, M Cai, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …

Efficient domain adaptation for speech foundation models

B Li, D Hwang, Z Huo, J Bai, G Prakash… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Foundation models (FMs), that are trained on broad data at scale and are adaptable to a
wide range of downstream tasks, have brought large interest in the research community …

Cosmic: Data efficient instruction-tuning for speech in-context learning

J Pan, J Wu, Y Gaur, S Sivasankaran, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a data and cost efficient way of incorporating the speech modality into a large
language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with …

End-to-end speech recognition contextualization with large language models

E Lakomkin, C Wu, Y Fathullah, O Kalinli… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

Listen, know and spell: Knowledge-infused subword modeling for improving asr performance of oov named entities

N Das, M Sunkara, D Bekal, DH Chau… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Automatic speech recognition (ASR) is increasingly being used in specialized domains such
as medical ASR and news transcription. Owing to the lack of high quality annotated speech …

Can contextual biasing remain effective with Whisper and GPT-2?

G Sun, X Zheng, C Zhang, PC Woodland - arXiv preprint arXiv:2306.01942, 2023 - arxiv.org
End-to-end automatic speech recognition (ASR) and large language models, such as
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …