Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities
Multi-modal large language models are regarded as a crucial step towards Artificial General
Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT …
Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT …
Ml-lmcl: Mutual learning and large-margin contrastive learning for improving asr robustness in spoken language understanding
Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue
systems. However, the inevitable errors from automatic speech recognition (ASR) usually …
systems. However, the inevitable errors from automatic speech recognition (ASR) usually …
SeqXGPT: Sentence-level AI-generated text detection
Widely applied large language models (LLMs) can generate human-like content, raising
concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text …
concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text …
Mrrl: Modifying the reference via reinforcement learning for non-autoregressive joint multiple intent detection and slot filling
With the rise of non-autoregressive approach, some non-autoregressive models for joint
multiple intent detection and slot filling have obtained the promising inference speed …
multiple intent detection and slot filling have obtained the promising inference speed …
Polyvoice: Language models for speech to speech translation
We propose PolyVoice, a language model-based framework for speech-to-speech
translation (S2ST) system. Our framework consists of two language models: a translation …
translation (S2ST) system. Our framework consists of two language models: a translation …
Towards multi-intent spoken language understanding via hierarchical attention and optimal transport
X Cheng, Z Zhu, H Li, Y Li, X Zhuang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Multi-Intent spoken language understanding (SLU) can handle complicated utterances
expressing multiple intents, which has attracted increasing attention from researchers …
expressing multiple intents, which has attracted increasing attention from researchers …
Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study
Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …
due to its superior compression rate of the waveform, fast convergence during training, and …
Translatotron 3: Speech to speech translation with monolingual data
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-
speech translation from monolingual speech-text datasets by combining masked …
speech translation from monolingual speech-text datasets by combining masked …
A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks
Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer
End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …