Recent advances in speech language models: A survey

W Cui, D Yu, X Jiao, Z Meng, G Zhang, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have recently garnered significant attention, primarily for
their capabilities in text-based interactions. However, natural human interaction often relies …

WavChat: A Survey of Spoken Dialogue Models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

W Yu, S Wang, X Yang, X Chen, X Tian… - arXiv preprint arXiv …, 2024 - arxiv.org
Full-duplex multimodal large language models (LLMs) provide a unified framework for
addressing diverse speech understanding and generation tasks, enabling more natural and …

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

X Song, M Xing, C Ma, S Li, D Wu, B Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works
typically employ complex data processing pipelines to obtain high-quality training data …

Syllablelm: Learning coarse semantic units for speech language models

A Baade, P Peng, D Harwath - arXiv preprint arXiv:2410.04029, 2024 - arxiv.org
Language models require tokenized inputs. However, tokenization strategies for continuous
data like audio and vision are often based on simple heuristics such as fixed sized …

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

GT Lin, PG Shivakumar, A Gourav, Y Gu… - arXiv preprint arXiv …, 2024 - arxiv.org
While textless Spoken Language Models (SLMs) have shown potential in end-to-end
speech-to-speech modeling, they still lag behind text-based Large Language Models …

Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents

B Veluri, BN Peloquin, B Yu, H Gong… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite broad interest in modeling spoken dialogue agents, most approaches are
inherently" half-duplex"--restricted to turn-based interaction with responses requiring explicit …

A suite for acoustic language model evaluation

G Maimon, A Roth, Y Adi - arXiv preprint arXiv:2409.07437, 2024 - arxiv.org
Speech language models have recently demonstrated great potential as universal speech
processing systems. Such models have the ability to model the rich acoustic information …

Roadmap towards Superhuman Speech Understanding using Large Language Models

F Bu, Y Zhang, X Wang, B Wang, Q Liu, H Li - arXiv preprint arXiv …, 2024 - arxiv.org
The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …

A Survey on Speech Large Language Models

J Peng, Y Wang, Y Xi, X Li, K Yu - arXiv preprint arXiv:2410.18908, 2024 - arxiv.org
Large Language Models (LLMs) exhibit strong contextual understanding and remarkable
multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …