Dub: Discrete unit back-translation for speech translation

D Zhang, S Li, X Zhang, J Zhan, P Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Multi-modal large language models are regarded as a crucial step towards Artificial General
Intelligence (AGI) and have garnered significant interest with the emergence of ChatGPT …

被引用次数：159 相关文章所有 6 个版本

[PDF] arxiv.org

Ml-lmcl: Mutual learning and large-margin contrastive learning for improving asr robustness in spoken language understanding

X Cheng, B Cao, Q Ye, Z Zhu, H Li, Y Zou - arXiv preprint arXiv …, 2023 - arxiv.org

Spoken language understanding (SLU) is a fundamental task in the task-oriented dialogue
systems. However, the inevitable errors from automatic speech recognition (ASR) usually …

被引用次数：49 相关文章所有 4 个版本

[PDF] arxiv.org

SeqXGPT: Sentence-level AI-generated text detection

P Wang, L Li, K Ren, B Jiang, D Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Widely applied large language models (LLMs) can generate human-like content, raising
concerns about the abuse of LLMs. Therefore, it is important to build strong AI-generated text …

被引用次数：25 相关文章所有 5 个版本

[PDF] aclanthology.org

Mrrl: Modifying the reference via reinforcement learning for non-autoregressive joint multiple intent detection and slot filling

X Cheng, Z Zhu, B Cao, Q Ye, Y Zou - Findings of the Association …, 2023 - aclanthology.org

With the rise of non-autoregressive approach, some non-autoregressive models for joint
multiple intent detection and slot filling have obtained the promising inference speed …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Polyvoice: Language models for speech to speech translation

Q Dong, Z Huang, Q Tian, C Xu, T Ko, Y Zhao… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose PolyVoice, a language model-based framework for speech-to-speech
translation (S2ST) system. Our framework consists of two language models: a translation …

被引用次数：18 相关文章所有 2 个版本

[PDF] aaai.org

Towards multi-intent spoken language understanding via hierarchical attention and optimal transport

X Cheng, Z Zhu, H Li, Y Li, X Zhuang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Multi-Intent spoken language understanding (SLU) can handle complicated utterances
expressing multiple intents, which has attracted increasing attention from researchers …

被引用次数：12 相关文章

[PDF] arxiv.org

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

X Chang, B Yan, K Choi, JW Jung, Y Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model

J Shi, X Ma, H Inaguma, A Sun, S Watanabe - arXiv preprint arXiv …, 2024 - arxiv.org

Speech discrete representation has proven effective in various downstream applications
due to its superior compression rate of the waveform, fast convergence during training, and …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Translatotron 3: Speech to speech translation with monolingual data

E Nachmani, A Levkovitch, Y Ding… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

This paper presents Translatotron 3, a novel approach to unsupervised direct speech-to-
speech translation from monolingual speech-text datasets by combining masked …

被引用次数：7 相关文章所有 3 个版本

A multitask co-training framework for improving speech translation by leveraging speech recognition and machine translation tasks

Y Zhou, Y Yuan, X Shi - Neural Computing and Applications, 2024 - Springer

End-to-end speech translation (ST) has attracted substantial attention due to its less error
accumulation and lower latency. Based on triplet ST data⟨ speech-transcription …

被引用次数：1 相关文章