Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data

Q Yu, J Li, L Wei, L Pang, W Ye, B Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) tuned on machine-generated
instruction-following data have demonstrated remarkable performance in various multimodal …

How many unicorns are in this image? a safety evaluation benchmark for vision llms

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …

LLaMA-adapter: Efficient fine-tuning of large language models with zero-initialized attention

R Zhang, J Han, C Liu, A Zhou, P Lu… - The Twelfth …, 2024 - openreview.net
With the rising tide of large language models (LLMs), there has been a growing interest in
developing general-purpose instruction-following models, eg, ChatGPT. To this end, we …

Causal Inference with Latent Variables: Recent Advances and Future Prospectives

Y Zhu, Y He, J Ma, M Hu, S Li, J Li - Proceedings of the 30th ACM …, 2024 - dl.acm.org
Causality lays the foundation for the trajectory of our world. Causal inference (CI), which
aims to infer intrinsic causal relations among variables of interest, has emerged as a crucial …

Eyes can deceive: Benchmarking counterfactual reasoning abilities of multi-modal large language models

Y Li, W Tian, Y Jiao, J Chen - arXiv preprint arXiv:2404.12966, 2024 - arxiv.org
Counterfactual reasoning, as a crucial manifestation of human intelligence, refers to making
presuppositions based on established facts and extrapolating potential outcomes. Existing …

Sight beyond text: Multi-modal training enhances llms in truthfulness and ethics

H Tu, B Zhao, C Wei, C Xie - arXiv preprint arXiv:2309.07120, 2023 - arxiv.org
Multi-modal large language models (MLLMs) are trained based on large language models
(LLM), with an enhanced capability to comprehend multi-modal inputs and generate textual …

The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective

Z Qin, D Chen, W Zhang, L Yao, Y Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid development of large language models (LLMs) has been witnessed in recent
years. Based on the powerful LLMs, multi-modal LLMs (MLLMs) extend the modality from …

A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future

S Sun, W An, F Tian, F Nan, Q Liu, J Liu, N Shah… - arXiv preprint arXiv …, 2024 - arxiv.org
Artificial intelligence (AI) has rapidly developed through advancements in computational
power and the growth of massive datasets. However, this progress has also heightened …

How Many Are in This Image A Safety Evaluation Benchmark for Vision LLMs

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - … on Computer Vision, 2025 - Springer
This work focuses on benchmarking the capabilities of vision large language models
(VLLMs) in visual reasoning. Different from prior studies, we shift our focus from evaluating …

Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches

A Mumuni, F Mumuni - arXiv preprint arXiv:2501.03151, 2025 - arxiv.org
Generative artificial intelligence (AI) systems based on large-scale pretrained foundation
models (PFMs) such as vision-language models, large language models (LLMs), diffusion …