Deep shallow fusion for RNN-T personalization

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：325 相关文章所有 7 个版本

[PDF] ieee.org

A metaverse: Taxonomy, components, applications, and open challenges

SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org

Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …

被引用次数：1436 相关文章所有 6 个版本

[PDF] arxiv.org

Contextual adapters for personalized speech recognition in neural transducers

KM Sathyendra, T Muniyappa, FJ Chang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …

被引用次数：61 相关文章所有 4 个版本

[PDF] arxiv.org

Contextualized streaming end-to-end speech recognition with trie-based deep biasing and shallow fusion

D Le, M Jain, G Keren, S Kim, Y Shi… - arXiv preprint arXiv …, 2021 - arxiv.org

How to leverage dynamic contextual information in end-to-end speech recognition has
remained an active research area. Previous solutions to this problem were either designed …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection

M Han, L Dong, Z Liang, M Cai, S Zhou… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …

被引用次数：34 相关文章所有 3 个版本

[PDF] amazon.science

Personalization of ctc speech recognition models

S Dingliwal, M Sunkara, S Ronanki… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

End-to-end speech recognition models trained using joint Connectionist Temporal
Classification (CTC)-Attention loss have gained popularity recently. In these models, a non …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

End-to-end speech recognition contextualization with large language models

E Lakomkin, C Wu, Y Fathullah, O Kalinli… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Tree-constrained pointer generator for end-to-end contextual speech recognition

G Sun, C Zhang, PC Woodland - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Can contextual biasing remain effective with Whisper and GPT-2?

G Sun, X Zheng, C Zhang, PC Woodland - arXiv preprint arXiv:2306.01942, 2023 - arxiv.org

End-to-end automatic speech recognition (ASR) and large language models, such as
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Towards contextual spelling correction for customization of end-to-end speech recognition systems

X Wang, Y Liu, J Li, V Miljanic, S Zhao… - … /ACM Transactions on …, 2022 - ieeexplore.ieee.org

Contextual biasing is an important and challenging task for end-to-end automatic speech
recognition (ASR) systems, which aims to achieve better recognition performance by biasing …

被引用次数：18 相关文章所有 4 个版本