HallusionBench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models
We introduce" HallusionBench" a comprehensive benchmark designed for the evaluation of
image-context reasoning. This benchmark presents significant challenges to advanced large …
image-context reasoning. This benchmark presents significant challenges to advanced large …
[PDF][PDF] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Large language models (LLMs), after being aligned with vision models and integrated into
vision-language models (VLMs), can bring impressive improvement in image reasoning …
vision-language models (VLMs), can bring impressive improvement in image reasoning …
Mmt-bench: A comprehensive multimodal benchmark for evaluating large vision-language models towards multitask agi
Large Vision-Language Models (LVLMs) show significant strides in general-purpose
multimodal applications such as visual dialogue and embodied navigation. However …
multimodal applications such as visual dialogue and embodied navigation. However …
Have we built machines that think like people?
A chief goal of artificial intelligence is to build machines that think like people. Yet it has
been argued that deep neural network architectures fail to accomplish this. Researchers …
been argued that deep neural network architectures fail to accomplish this. Researchers …
VGA: Vision GUI Assistant-Minimizing Hallucinations through Image-Centric Fine-Tuning
M Ziyang, Y Dai, Z Gong, S Guo… - Findings of the …, 2024 - aclanthology.org
Abstract Large Vision-Language Models (VLMs) have already been applied to the
understanding of Graphical User Interfaces (GUIs) and have achieved notable results …
understanding of Graphical User Interfaces (GUIs) and have achieved notable results …
IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models
The advent of Vision Language Models (VLM) has allowed researchers to investigate the
visual understanding of a neural network using natural language. Beyond object …
visual understanding of a neural network using natural language. Beyond object …
Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents
With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …
models have made groundbreaking advances in numerous natural language processing …
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Spatial expressions in situated communication can be ambiguous, as their meanings vary
depending on the frames of reference (FoR) adopted by speakers and listeners. While …
depending on the frames of reference (FoR) adopted by speakers and listeners. While …
Evaluating Vision-Language Models on Bistable Images
A Panagopoulou, C Melkin… - arXiv preprint arXiv …, 2024 - arxiv.org
Bistable images, also known as ambiguous or reversible images, present visual stimuli that
can be seen in two distinct interpretations, though not simultaneously by the observer. In this …
can be seen in two distinct interpretations, though not simultaneously by the observer. In this …
Is Cognition consistent with Perception? Assessing and Mitigating Multimodal Knowledge Conflicts in Document Understanding
Multimodal large language models (MLLMs) have shown impressive capabilities in
document understanding, a rapidly growing research area with significant industrial demand …
document understanding, a rapidly growing research area with significant industrial demand …