MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

J Liu, P Zhou, Y Hua, D Chong, Z Tian… - Advances in …, 2024 - proceedings.neurips.cc

Recent advancements in large language models (LLMs) have transformed the field of
question answering (QA). However, evaluating LLMs in the medical field is challenging due …

被引用次数：36 相关文章所有 6 个版本

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?

Q Zhao, M Xu, K Gupta, A Asthana, L Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision-language models (LVLMs), designed to interpret and respond to human
instructions, occasionally generate hallucinated or harmful content due to inappropriate …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Adashield: Safeguarding multimodal large language models from structure-based attack via adaptive shield prompting

Y Wang, X Liu, Y Li, M Chen, C Xiao - arXiv preprint arXiv:2403.09513, 2024 - arxiv.org

With the advent and widespread deployment of Multimodal Large Language Models
(MLLMs), the imperative to ensure their safety has become increasingly pronounced …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs) have shown impressive reasoning abilities,
which, however, are also more vulnerable to jailbreak attacks than their LLM predecessors …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Safety of Multimodal Large Language Models on Images and Text

X Liu, Y Zhu, Y Lan, C Yang, Y Qiao - arXiv preprint arXiv:2402.00357, 2024 - arxiv.org

Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the
public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

Z Ying, A Liu, T Zhang, Z Yu, S Liang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-
teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Direct large language model alignment through self-rewarding contrastive prompt distillation

A Liu, H Bai, Z Lu, X Kong, S Wang, J Shan… - arXiv preprint arXiv …, 2024 - arxiv.org

Aligning large language models (LLMs) with human expectations without human-annotated
preference data is an important problem. In this paper, we propose a method to evaluate the …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

GradSafe: Detecting Unsafe Prompts for LLMs via Safety-Critical Gradient Analysis

Y Xie, M Fang, R Pi, N Gong - arXiv preprint arXiv:2402.13494, 2024 - arxiv.org

Large Language Models (LLMs) face threats from unsafe prompts. Existing methods for
detecting unsafe prompts are primarily online moderation APIs or finetuned LLMs. These …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Y Zhang, L Chen, G Zheng, Y Gao, R Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of Vision Language Models (VLMs) has brought unprecedented advances
in understanding multimodal information. The combination of textual and visual semantics in …