FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

T SpeechTeam - arXiv preprint arXiv:2407.04051, 2024 - arxiv.org
This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …

An efficient text augmentation approach for contextualized Mandarin speech recognition

N Zheng, X Wan, K Liu, Z Du, Z Huan - arXiv preprint arXiv:2406.09950, 2024 - arxiv.org
Although contextualized automatic speech recognition (ASR) systems are commonly used to
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …

CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting

Y Li, Y Li, M Zhang, C Su, M Piao, X Qiao, J Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations, or technical terms that are not …

CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting

Y Li, Y Li, M Zhang, C Su, J Yu, M Piao… - Proceedings of the …, 2024 - aclanthology.org
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations and terminologies that are not …