Recent advances in speech language models: A survey
Large Language Models (LLMs) have recently garnered significant attention, primarily for
their capabilities in text-based interactions. However, natural human interaction often relies …
their capabilities in text-based interactions. However, natural human interaction often relies …
WavChat: A Survey of Spoken Dialogue Models
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …
have captured significant attention in the speech domain. Compared to traditional three-tier …
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Full-duplex multimodal large language models (LLMs) provide a unified framework for
addressing diverse speech understanding and generation tasks, enabling more natural and …
addressing diverse speech understanding and generation tasks, enabling more natural and …
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch
It is well known that LLM-based systems are data-hungry. Recent LLM-based TTS works
typically employ complex data processing pipelines to obtain high-quality training data …
typically employ complex data processing pipelines to obtain high-quality training data …
Syllablelm: Learning coarse semantic units for speech language models
Language models require tokenized inputs. However, tokenization strategies for continuous
data like audio and vision are often based on simple heuristics such as fixed sized …
data like audio and vision are often based on simple heuristics such as fixed sized …
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
While textless Spoken Language Models (SLMs) have shown potential in end-to-end
speech-to-speech modeling, they still lag behind text-based Large Language Models …
speech-to-speech modeling, they still lag behind text-based Large Language Models …
Beyond Turn-Based Interfaces: Synchronous LLMs as Full-Duplex Dialogue Agents
Despite broad interest in modeling spoken dialogue agents, most approaches are
inherently" half-duplex"--restricted to turn-based interaction with responses requiring explicit …
inherently" half-duplex"--restricted to turn-based interaction with responses requiring explicit …
A suite for acoustic language model evaluation
Speech language models have recently demonstrated great potential as universal speech
processing systems. Such models have the ability to model the rich acoustic information …
processing systems. Such models have the ability to model the rich acoustic information …
Roadmap towards Superhuman Speech Understanding using Large Language Models
The success of large language models (LLMs) has prompted efforts to integrate speech and
audio data, aiming to create general foundation models capable of processing both textual …
audio data, aiming to create general foundation models capable of processing both textual …
A Survey on Speech Large Language Models
Large Language Models (LLMs) exhibit strong contextual understanding and remarkable
multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …
multi-task performance. Therefore, researchers have been seeking to integrate LLMs in the …