A survey on multimodal large language models for autonomous driving

C Cui, Y Ma, X Cao, W Ye, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the emergence of Large Language Models (LLMs) and Vision Foundation Models
(VFMs), multimodal AI systems benefiting from large models have the potential to equally …

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

Binding touch to everything: Learning unified multimodal tactile representations

F Yang, C Feng, Z Chen, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

Gpt4point: A unified framework for point-language understanding and generation

Z Qi, Y Fang, Z Sun, X Wu, T Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …

Large language model can interpret latent space of sequential recommender

Z Yang, J Wu, Y Luo, J Zhang, Y Yuan, A Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Sequential recommendation is to predict the next item of interest for a user, based on her/his
interaction history with previous items. In conventional sequential recommenders, a common …

Lidar-llm: Exploring the potential of large language models for 3d lidar understanding

S Yang, J Liu, R Zhang, M Pan, Z Guo, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, Large Language Models (LLMs) and Multimodal Large Language Models
(MLLMs) have shown promise in instruction following and 2D image understanding. While …

LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention

R Zhang, J Han, C Liu, A Zhou, P Lu… - The Twelfth …, 2024 - openreview.net
With the rising tide of large language models (LLMs), there has been a growing interest in
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org
The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered
unparalleled attention, due to their superior performance in visual contexts. However, their …

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …