WavChat: A Survey of Spoken Dialogue Models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Voicebench: Benchmarking llm-based voice assistants

Y Chen, X Yue, C Zhang, X Gao, RT Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
Building on the success of large language models (LLMs), recent advancements such as
GPT-4o have enabled real-time speech interactions through LLM-based voice assistants …

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arXiv preprint arXiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …

From Specific-MLLM to Omni-MLLM: A Survey about the MLLMs alligned with Multi-Modality

S Jiang, J Liang, M Liu, B Qin - arXiv preprint arXiv:2412.11694, 2024 - arxiv.org
From the Specific-MLLM, which excels in single-modal tasks, to the Omni-MLLM, which
extends the range of general modalities, this evolution aims to achieve understanding and …

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

S Xie, W Zu, M Zhao, D Su, S Liu, R Shi, G Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Autoregression in large language models (LLMs) has shown impressive scalability by
unifying all language tasks into the next token prediction paradigm. Recently, there is a …

ChipAlign: Instruction Alignment in Large Language Models for Chip Design via Geodesic Interpolation

C Deng, Y Bai, H Ren - arXiv preprint arXiv:2412.19819, 2024 - arxiv.org
Recent advancements in large language models (LLMs) have expanded their application
across various domains, including chip design, where domain-adapted chip models like …