Robust acoustic and semantic contextual biasing in neural transducers for speech recognition

H Xu, F Jia, S Majumdar, H Huang… - International …, 2023 - proceedings.mlr.press

This paper introduces a novel Token-and-Duration Transducer (TDT) architecture for
sequence-to-sequence tasks. TDT extends conventional RNN-Transducer architectures by …

被引用次数：8 相关文章所有 8 个版本

[PDF] arxiv.org

End-to-end speech recognition contextualization with large language models

E Lakomkin, C Wu, Y Fathullah, O Kalinli… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Token-level serialized output training for joint streaming asr and st leveraging textual alignments

S Papi, P Wang, J Chen, J Xue, J Li… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

In real-world applications, users often require both translations and transcriptions of speech
to enhance their comprehension, particularly in streaming scenarios where incremental …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Contextual biasing of named-entities with large language models

C Sun, Z Ahmed, Y Ma, Z Liu, L Kabela… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

We explore contextual biasing with Large Language Models (LLMs) to enhance Automatic
Speech Recognition (ASR) in second-pass rescoring. Our approach introduces the …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Improving large-scale deep biasing with phoneme features and text-only data in streaming transducer

J Qiu, L Huang, B Li, J Zhang, L Lu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Deep biasing for the Transducer can improve the recognition performance of rare words or
contextual entities, which is essential in practical applications, especially for streaming …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition

K Huang, A Zhang, B Zhang, T Xu… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

The attention-based deep contextual biasing method has been demonstrated to effectively
improve the recognition performance of end-to-end automatic speech recognition (ASR) …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Speed of Light Exact Greedy Decoding for RNN-T Speech Recognition Models on GPU

D Galvez, V Bataev, H Xu, T Kaldewey - arXiv preprint arXiv:2406.03791, 2024 - arxiv.org

The vast majority of inference time for RNN Transducer (RNN-T) models today is spent on
decoding. Current state-of-the-art RNN-T decoding implementations leave the GPU idle …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition

P Fan, C Shan, S Sun, Q Yang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org

Recently, Conformer as a backbone network for end-to-end automatic speech recognition
achieved state-of-the-art performance. The Conformer block leverages a self-attention …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss

M Shakeel, Y Sudo, Y Peng, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

Contextualized end-to-end automatic speech recognition has been an active research area,
with recent efforts focusing on the implicit learning of contextual phrases based on the final …

Improving ASR Contextual Biasing with Guided Attention

J Tang, K Kim, S Shon, F Wu… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the
effectiveness and robustness of automatic speech recognition (ASR) contextual biasing …

被引用次数：2 相关文章所有 3 个版本