Speechgpt: Empowering large language models with intrinsic cross-modal conversational abilities

D Zhang, S Li, X Zhang, J Zhan, P Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Multi-modal large language models are regarded as a crucial step towards Artificial General
Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT …

Ml-lmcl: Mutual learning and large-margin contrastive learning for improving asr robustness in spoken language understanding

X Cheng, B Cao, Q Ye, Z Zhu, H Li, Y Zou - arXiv preprint arXiv …, 2023 - arxiv.org
Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue
systems. However, the inevitable errors from automatic speech recognition (ASR) usually …

SeqXGPT: Sentence-level AI-generated text detection

P Wang, L Li, K Ren, B Jiang, D Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Widely applied large language models (LLMs) can generate human-like content, raising
concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text …

Mrrl: Modifying the reference via reinforcement learning for non-autoregressive joint multiple intent detection and slot filling

X Cheng, Z Zhu, B Cao, Q Ye, Y Zou - Findings of the Association …, 2023 - aclanthology.org
With the rise of non-autoregressive approach, some non-autoregressive models for joint
multiple intent detection and slot filling have obtained the promising inference speed …

Polyvoice: Language models for speech to speech translation

Q Dong, Z Huang, Q Tian, C Xu, T Ko, Y Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose PolyVoice, a language model-based framework for speech-to-speech
translation (S2ST) system. Our framework consists of two language models: a translation …

Towards multi-intent spoken language understanding via hierarchical attention and optimal transport

X Cheng, Z Zhu, H Li, Y Li, X Zhuang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Multi-Intent spoken language understanding (SLU) can handle complicated utterances
expressing multiple intents, which has attracted increasing attention from researchers …

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

X Chang, B Yan, K Choi, JW Jung, Y Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

J Shi, X Ma, H Inaguma, A Sun, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org
Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …

Translatotron 3: Speech to speech translation with monolingual data

E Nachmani, A Levkovitch, Y Ding… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-
speech translation from monolingual speech-text datasets by combining masked …

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks

Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer
End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …