Point-bind & point-llm: Aligning point cloud with multi-modality for 3d understanding, generation...

C Cui, Y Ma, X Cao, W Ye, Y Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the emergence of Large Language Models (LLMs) and Vision Foundation Models
(VFMs), multimodal AI systems benefiting from large models have the potential to equally …

被引用次数：112 相关文章所有 7 个版本

[PDF] thecvf.com

Onellm: One framework to align all modalities with language

J Han, K Gong, Y Zhang, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have gained significant attention due to their
strong multimodal understanding capability. However existing works rely heavily on modality …

被引用次数：25 相关文章所有 3 个版本

[PDF] thecvf.com

Binding touch to everything: Learning unified multimodal tactile representations

F Yang, C Feng, Z Chen, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com

The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …

被引用次数：14 相关文章所有 4 个版本

[PDF] thecvf.com

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding Reasoning and Planning

S Chen, X Chen, C Zhang, M Li, G Yu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent progress in Large Multimodal Models (LMM) has opened up great
possibilities for various applications in the field of human-machine interactions. However …

被引用次数：15 相关文章所有 3 个版本

[PDF] thecvf.com

Gpt4point: A unified framework for point-language understanding and generation

Z Qi, Y Fang, Z Sun, X Wu, T Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Large language model can interpret latent space of sequential recommender

Z Yang, J Wu, Y Luo, J Zhang, Y Yuan, A Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Sequential recommendation is to predict the next item of interest for a user, based on her/his
interaction history with previous items. In conventional sequential recommenders, a common …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Lidar-llm: Exploring the potential of large language models for 3d lidar understanding

S Yang, J Liu, R Zhang, M Pan, Z Guo, X Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, Large Language Models (LLMs) and Multimodal Large Language Models
(MLLMs) have shown promise in instruction following and 2D image understanding. While …

被引用次数：24 相关文章所有 3 个版本

[PDF] openreview.net

LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention

R Zhang, J Han, C Liu, A Zhou, P Lu… - The Twelfth …, 2024 - openreview.net

With the rising tide of large language models (LLMs), there has been a growing interest in
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …

被引用次数：22 相关文章

[PDF] arxiv.org

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered
unparalleled attention, due to their superior performance in visual contexts. However, their …

被引用次数：28 相关文章所有 2 个版本

[PDF] arxiv.org

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

被引用次数：11 相关文章所有 3 个版本