Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data
Abstract Multi-modal Large Language Models (MLLMs) tuned on machine-generated
instruction-following data have demonstrated remarkable performance in various multimodal …
instruction-following data have demonstrated remarkable performance in various multimodal …
How many unicorns are in this image? a safety evaluation benchmark for vision llms
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …
from prior studies, we shift our focus from evaluating standard performance to introducing a …
LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention
With the rising tide of large language models (LLMs), there has been a growing interest in
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …
Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics
Multi-modal large language models (MLLMs) are trained based on large language models
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …
Causal Inference with Latent Variables: Recent Advances and Future Prospectives
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which
aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial …
aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial …
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
Eyes Can Deceive: Benchmarking Counterfactual Reasoning Abilities of Multi-modal Large Language Models
Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making
presuppositions based on established facts and extrapolating potential outcomes. Existing …
presuppositions based on established facts and extrapolating potential outcomes. Existing …
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
The advancement of large language models (LLMs) has significantly broadened the scope
of applications in natural language processing, with multi-modal LLMs extending these …
of applications in natural language processing, with multi-modal LLMs extending these …
Enhancing Multimodal Understanding With LIUS: A Novel Framework for Visual Question Answering in Digital Marketing
C Song - Journal of Organizational and End User Computing …, 2024 - igi-global.com
VQA (visual question and answer) is the task of enabling a computer to generate accurate
textual answers based on given images and related questions. It integrates computer vision …
textual answers based on given images and related questions. It integrates computer vision …
[PDF][PDF] How Many Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
This work focuses on benchmarking the capabilities of vision large language models
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …