Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …
impressive capabilities in multimodal understanding reasoning and interaction. However …
Receive, reason, and react: Drive as you say, with large language models in autonomous vehicles
The fusion of human-centric design and artificial intelligence capabilities has opened up
new possibilities for next-generation autonomous vehicles that go beyond traditional …
new possibilities for next-generation autonomous vehicles that go beyond traditional …
Pivot: Iterative visual prompting elicits actionable knowledge for vlms
Vision language models (VLMs) have shown impressive capabilities across a variety of
tasks, from logical reasoning to visual understanding. This opens the door to richer …
tasks, from logical reasoning to visual understanding. This opens the door to richer …
GPT4Vis: what can GPT-4 do for zero-shot visual recognition?
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …
know baseline in light of the latest advancements in Generative Artificial Intelligence …
Gpt as psychologist? preliminary evaluations for gpt-4v on visual affective computing
Multimodal large language models (MLLMs) are designed to process and integrate
information from multiple sources such as text speech images and videos. Despite its …
information from multiple sources such as text speech images and videos. Despite its …
Towards knowledge-driven autonomous driving
This paper explores the emerging knowledge-driven autonomous driving technologies. Our
investigation highlights the limitations of current autonomous driving systems, in particular …
investigation highlights the limitations of current autonomous driving systems, in particular …
Data-centric evolution in autonomous driving: A comprehensive survey of big data system, data mining, and closed-loop technologies
The aspiration of the next generation's autonomous driving (AD) technology relies on the
dedicated integration and interaction among intelligent perception, prediction, planning, and …
dedicated integration and interaction among intelligent perception, prediction, planning, and …
Large multimodal agents: A survey
Large language models (LLMs) have achieved superior performance in powering text-
based AI agents, endowing them with decision-making and reasoning abilities akin to …
based AI agents, endowing them with decision-making and reasoning abilities akin to …
Forging vision foundation models for autonomous driving: Challenges, methodologies, and opportunities
The rise of large foundation models, trained on extensive datasets, is revolutionizing the
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …
field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by …
Scaffolding coordinates to promote vision-language coordination in large multi-modal models
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …
capabilities in vision-language tasks. Despite their advanced functionalities, the …