Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data
Abstract Multi-modal Large Language Models (MLLMs) tuned on machine-generated
instruction-following data have demonstrated remarkable performance in various multimodal …
instruction-following data have demonstrated remarkable performance in various multimodal …
How many unicorns are in this image? a safety evaluation benchmark for vision llms
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …
from prior studies, we shift our focus from evaluating standard performance to introducing a …
LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention
With the rising tide of large language models (LLMs), there has been a growing interest in
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …
Causal Inference with Latent Variables: Recent Advances and Future Prospectives
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which
aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial …
aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial …
Eyes can deceive: Benchmarking counterfactual reasoning abilities of multi-modal large language models
Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making
presuppositions based on established facts and extrapolating potential outcomes. Existing …
presuppositions based on established facts and extrapolating potential outcomes. Existing …
Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics
Multi-modal large language models (MLLMs) are trained based on large language models
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Artificial intelligence (AI) has rapidly developed through advancements in computational
power and the growth of massive datasets. However, this progress has also heightened …
power and the growth of massive datasets. However, this progress has also heightened …
How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs
This work focuses on benchmarking the capabilities of vision large language models
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …
Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches
A Mumuni, F Mumuni - arXiv preprint arXiv:2501.03151, 2025 - arxiv.org
Generative artificial intelligence (AI) systems based on large-scale pretrained foundation
models (PFMs) such as vision-language models, large language models (LLMs), diffusion …
models (PFMs) such as vision-language models, large language models (LLMs), diffusion …