Faithscore: Evaluating hallucinations in large vision-language models

S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - National Science …, 2024 - academic.oup.com

Abstract Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has
been a new rising research hotspot, which uses powerful Large Language Models (LLMs) …

被引用次数：135 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

被引用次数：134 相关文章所有 2 个版本

[PDF] acm.org

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2023 - dl.acm.org

The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

被引用次数：732 相关文章所有 2 个版本

[PDF] arxiv.org

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

被引用次数：39 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of multimodal large language model from a data-centric perspective

T Bai, H Liang, B Wan, Y Xu, X Li, S Li, L Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …

被引用次数：28 相关文章所有 2 个版本

[PDF] arxiv.org

Getting it right: Improving spatial consistency in text-to-image models

A Chatterjee, GBM Stan, E Aflalo, S Paul… - … on Computer Vision, 2025 - Springer

One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Clip-dpo: Vision-language models as a source of preference for fixing hallucinations in lvlms

Y Ouali, A Bulat, B Martinez… - European Conference on …, 2025 - Springer

Despite recent successes, LVLMs or Large Vision Language Models are prone to
hallucinating details like objects and their properties or relations, limiting their real-world …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models

C Jiang, H Jia, M Dong, W Ye, H Xu, M Yan… - Proceedings of the …, 2024 - dl.acm.org

Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle
with''hallucinations''-inconsistencies between images and their descriptions. Previous …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

被引用次数：85 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-object hallucination in vision-language models

X Chen, Z Ma, X Zhang, S Xu, S Qian, J Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision language models (LVLMs) often suffer from object hallucination, producing
objects not present in the given images. While current benchmarks for object hallucination …

被引用次数：12 相关文章所有 5 个版本