A survey on multimodal large language models
S Yin, C Fu, S Zhao, K Li, X Sun, T Xu… - National Science …, 2024 - academic.oup.com
Abstract Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has
been a new rising research hotspot, which uses powerful Large Language Models (LLMs) …
been a new rising research hotspot, which uses powerful Large Language Models (LLMs) …
A survey on hallucination in large vision-language models
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …
attention within the AI landscape for its practical implementation potential. However,`` …
A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …
natural language processing (NLP), fueling a paradigm shift in information acquisition …
Unified hallucination detection for multimodal large language models
Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …
are plagued by the critical issue of hallucination. The reliable detection of such …
A survey of multimodal large language model from a data-centric perspective
Multimodal large language models (MLLMs) enhance the capabilities of standard large
language models by integrating and processing data from multiple modalities, including text …
language models by integrating and processing data from multiple modalities, including text …
Getting it right: Improving spatial consistency in text-to-image models
One of the key shortcomings in current text-to-image (T2I) models is their inability to
consistently generate images which faithfully follow the spatial relationships specified in the …
consistently generate images which faithfully follow the spatial relationships specified in the …
Clip-dpo: Vision-language models as a source of preference for fixing hallucinations in lvlms
Despite recent successes, LVLMs or Large Vision Language Models are prone to
hallucinating details like objects and their properties or relations, limiting their real-world …
hallucinating details like objects and their properties or relations, limiting their real-world …
Hal-eval: A universal and fine-grained hallucination evaluation framework for large vision language models
Large Vision-Language Models (LVLMs) exhibit remarkable capabilities but struggle
with''hallucinations''-inconsistencies between images and their descriptions. Previous …
with''hallucinations''-inconsistencies between images and their descriptions. Previous …
Hallucination of multimodal large language models: A survey
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …
multimodal large language models (MLLMs), also known as Large Vision-Language Models …
Multi-object hallucination in vision-language models
Large vision language models (LVLMs) often suffer from object hallucination, producing
objects not present in the given images. While current benchmarks for object hallucination …
objects not present in the given images. While current benchmarks for object hallucination …