On the road with gpt-4v (ision): Early explorations of visual-language model on autonomous driving

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

被引用次数：36 相关文章所有 3 个版本

[PDF] arxiv.org

Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles

C Cui, Y Ma, X Cao, W Ye… - IEEE Intelligent …, 2024 - ieeexplore.ieee.org

The fusion of human-centric design and artificial intelligence capabilities has opened up
new possibilities for next-generation autonomous vehicles that go beyond traditional …

被引用次数：26 相关文章所有 4 个版本

[PDF] arxiv.org

Pivot: Iterative visual prompting elicits actionable knowledge for vlms

S Nasiriany, F Xia, W Yu, T Xiao, J Liang… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

被引用次数：11 相关文章所有 2 个版本

[PDF] thecvf.com

Gpt as psychologist? preliminary evaluations for gpt-4v on visual affective computing

H Lu, X Niu, J Wang, Y Wang, Q Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) are designed to process and integrate
information from multiple sources such as text speech images and videos. Despite its …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Towards knowledge-driven autonomous driving

X Li, Y Bai, P Cai, L Wen, D Fu, B Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies

L Li, W Shao, W Dong, Y Tian, K Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

The aspiration of the next generation's autonomous driving (AD) technology relies on the
dedicated integration and interaction among intelligent perception, prediction, planning, and …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Large multimodal agents: A survey

J Xie, Z Chen, R Zhang, X Wan, G Li - arXiv preprint arXiv:2402.15116, 2024 - arxiv.org

Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities

X Yan, H Zhang, Y Cai, J Guo, W Qiu, B Gao… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org

State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

被引用次数：5 相关文章所有 2 个版本