Inferaligner: Inference-time alignment for harmlessness through cross-model guidance

D Liu, M Yang, X Qu, P Zhou, Y Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org

With the significant development of large models in recent years, Large Vision-Language
Models (LVLMs) have demonstrated remarkable capabilities across a wide range of …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Eyes closed, safety on: Protecting multimodal llms via image-to-text transformation

Y Gou, K Chen, Z Liu, L Hong, H Xu, Z Li… - … on Computer Vision, 2025 - Springer

Multimodal large language models (MLLMs) have shown impressive reasoning abilities.
However, they are also more vulnerable to jailbreak attacks than their LLM predecessors …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arXiv preprint arXiv …, 2023 - arxiv.org

The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

被引用次数：48 相关文章所有 2 个版本

[PDF] arxiv.org

Controllable text generation for large language models: A survey

X Liang, H Wang, Y Wang, S Song, J Yang… - arXiv preprint arXiv …, 2024 - arxiv.org

In Natural Language Processing (NLP), Large Language Models (LLMs) have demonstrated
high text generation quality. However, in real-world applications, LLMs must meet …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modality safety alignment

S Wang, X Ye, Q Cheng, J Duan, S Li, J Fu… - arXiv preprint arXiv …, 2024 - arxiv.org

As Artificial General Intelligence (AGI) becomes increasingly integrated into various facets of
human life, ensuring the safety and ethical alignment of such systems is paramount …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Locking down the finetuned llms safety

M Zhu, L Yang, Y Wei, N Zhang, Y Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Fine-tuning large language models (LLMs) on additional datasets is often necessary to
optimize them for specific downstream tasks. However, existing safety alignment measures …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Bathe: Defense against the jailbreak attack in multimodal large language models by treating harmful instruction as backdoor trigger

Y Chen, H Li, Z Zheng, Y Song - arXiv preprint arXiv:2408.09093, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have showcased impressive performance in a
variety of multimodal tasks. On the other hand, the integration of additional image modality …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Towards tracing trustworthiness dynamics: Revisiting pre-training period of large language models

C Qian, J Zhang, W Yao, D Liu, Z Yin, Y Qiao… - arXiv preprint arXiv …, 2024 - arxiv.org

Ensuring the trustworthiness of large language models (LLMs) is crucial. Most studies
concentrate on fully pre-trained LLMs to better understand and improve LLMs' …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Safealigner: Safety alignment against jailbreak attacks via response disparity guidance

C Huang, W Zhao, R Zheng, H Lv, S Dou, S Li… - arXiv preprint arXiv …, 2024 - arxiv.org

As the development of large language models (LLMs) rapidly advances, securing these
models effectively without compromising their utility has become a pivotal area of research …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Safety of Multimodal Large Language Models on Images and Text

X Liu, Y Zhu, Y Lan, C Yang, Y Qiao - arXiv preprint arXiv:2402.00357, 2024 - arxiv.org

Attracted by the impressive power of Multimodal Large Language Models (MLLMs), the
public is increasingly utilizing them to improve the efficiency of daily work. Nonetheless, the …

被引用次数：7 相关文章所有 2 个版本