A survey on multimodal large language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - National Science …, 2024 - academic.oup.com
Abstract Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has
been a new rising research hotspot, which uses powerful Large Language Models (LLMs) …

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2023 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

Getting it right: Improving spatial consistency in text-to-image models

A Chatterjee, GBM Stan, E Aflalo, S Paul… - … on Computer Vision, 2025 - Springer
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …

Clip-dpo: Vision-language models as a source of preference for fixing hallucinations in lvlms

Y Ouali, A Bulat, B Martinez… - European Conference on …, 2025 - Springer
Despite recent successes, LVLMs or Large Vision Language Models are prone to
hallucinating details like objects and their properties or relations, limiting their real-world …

Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models

C Jiang, H Jia, M Dong, W Ye, H Xu, M Yan… - Proceedings of the …, 2024 - dl.acm.org
Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle
with''hallucinations''-inconsistencies between images and their descriptions. Previous …

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

Multi-object hallucination in vision-language models

X Chen, Z Ma, X Zhang, S Xu, S Qian, J Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision language models (LVLMs) often suffer from object hallucination, producing
objects not present in the given images. While current benchmarks for object hallucination …