A survey of attacks on large vision-language models: Resources, advances, and future trends

D Liu, M Yang, X Qu, P Zhou, Y Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - … on Computer Vision, 2025 - Springer
Multimodal large language models (MLLMs) have shown impressive reasoning abilities.
However, they are also more vulnerable to jailbreak attacks than their LLM predecessors …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Controllable text generation for large language models: A survey

X Liang, H Wang, Y Wang, S Song, J Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
In Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated
high text generation quality. However, in real-world applications, LLMs must meet …

Cross-modality safety alignment

S Wang, X Ye, Q Cheng, J Duan, S Li, J Fu… - arXiv preprint arXiv …, 2024 - arxiv.org
As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of
human life, ensuring the safety and ethical alignment of such systems is paramount …

Locking down the finetuned llms safety

M Zhu, L Yang, Y Wei, N Zhang, Y Zhang - arXiv preprint arXiv …, 2024 - arxiv.org
Fine-tuning large language models (LLMs) on additional datasets is often necessary to
optimize them for specific downstream tasks. However, existing safety alignment measures …

Bathe: Defense against the jailbreak attack in multimodal large language models by treating harmful instruction as backdoor trigger

Y Chen, H Li, Z Zheng, Y Song - arXiv preprint arXiv:2408.09093, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have showcased impressive performance in a
variety of multimodal tasks. On the other hand, the integration of additional image modality …

Towards tracing trustworthiness dynamics: Revisiting pre-training period of large language models

C Qian, J Zhang, W Yao, D Liu, Z Yin, Y Qiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies
concentrate on fully pre-trained LLMs to better understand and improve LLMs' …

Safealigner: Safety alignment against jailbreak attacks via response disparity guidance

C Huang, W Zhao, R Zheng, H Lv, S Dou, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org
As the development of large language models (LLMs) rapidly advances, securing these
models effectively without compromising their utility has become a pivotal area of research …

Safety of Multimodal Large Language Models on Images and Text

X Liu, Y Zhu, Y Lan, C Yang, Y Qiao - arXiv preprint arXiv:2402.00357, 2024 - arxiv.org
Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the
public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the …