FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
T SpeechTeam - arXiv preprint arXiv:2407.04051, 2024 - arxiv.org
This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …
interactions between humans and large language models (LLMs). At its core are two …
An efficient text augmentation approach for contextualized Mandarin speech recognition
N Zheng, X Wan, K Liu, Z Du, Z Huan - arXiv preprint arXiv:2406.09950, 2024 - arxiv.org
Although contextualized automatic speech recognition (ASR) systems are commonly used to
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …
improve the recognition of uncommon words, their effectiveness is hindered by the inherent …
CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations, or technical terms that are not …
name entities, such as personal names, organizations, or technical terms that are not …
CB-Whisper: Contextual Biasing Whisper Using Open-Vocabulary Keyword-Spotting
End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare
name entities, such as personal names, organizations and terminologies that are not …
name entities, such as personal names, organizations and terminologies that are not …