Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow...

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：325 相关文章所有 7 个版本

[PDF] arxiv.org

Contextual adapters for personalized speech recognition in neural transducers

KM Sathyendra, T Muniyappa, FJ Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …

被引用次数：61 相关文章所有 4 个版本

[PDF] arxiv.org

Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection

M Han, L Dong, Z Liang, M Cai, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient domain adaptation for speech foundation models

B Li, D Hwang, Z Huo, J Bai, G Prakash… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Foundation models (FMs), that are trained on broad data at scale and are adaptable to a
wide range of downstream tasks, have brought large interest in the research community …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Cosmic: Data efficient instruction-tuning for speech in-context learning

J Pan, J Wu, Y Gaur, S Sivasankaran, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

We present a data and cost efficient way of incorporating the speech modality into a large
language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

End-to-end speech recognition contextualization with large language models

E Lakomkin, C Wu, Y Fathullah, O Kalinli… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Contextualized end-to-end speech recognition with contextual phrase prediction network

K Huang, A Zhang, Z Yang, P Guo, B Mu, T Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …

被引用次数：12 相关文章所有 4 个版本

[HTML] amazon.science

Listen, know and spell: Knowledge-infused subword modeling for improving asr performance of oov named entities

N Das, M Sunkara, D Bekal, DH Chau… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Automatic speech recognition (ASR) is increasingly being used in specialized domains such
as medical ASR and news transcription. Owing to the lack of high quality annotated speech …

被引用次数：20 相关文章所有 4 个版本

[PDF] arxiv.org

Can contextual biasing remain effective with Whisper and GPT-2?

G Sun, X Zheng, C Zhang, PC Woodland - arXiv preprint arXiv:2306.01942, 2023 - arxiv.org

End-to-end automatic speech recognition (ASR) and large language models, such as
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …

被引用次数：11 相关文章所有 5 个版本