Jatmo: Prompt injection defense by task-specific finetuning

S Chen, J Piet, C Sitawarin, D Wagner - arXiv preprint arXiv:2402.06363, 2024 - arxiv.org

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated
applications, which perform text-based tasks by utilizing their advanced language …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

When llms meet cybersecurity: A systematic literature review

J Zhang, H Bu, H Wen, Y Chen, L Li, H Zhu - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid advancements in large language models (LLMs) have opened new avenues
across various fields, including cybersecurity, which faces an ever-evolving threat landscape …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Q Zhan, Z Liang, Z Ying, D Kang - arXiv preprint arXiv:2403.02691, 2024 - arxiv.org

Recent work has embodied LLMs as agents, allowing them to access tools, perform actions,
and interact with external content (eg, emails or websites). However, external content …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Prioritizing safeguarding over autonomy: Risks of llm agents for science

X Tang, Q Jin, K Zhu, T Yuan, Y Zhang, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Intelligent agents powered by large language models (LLMs) have demonstrated substantial
promise in autonomously conducting experiments and facilitating scientific discoveries …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

On the duality between sharpness-aware minimization and adversarial training

Y Zhang, H He, J Zhu, H Chen, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Adversarial Training (AT), which adversarially perturb the input samples during training, has
been acknowledged as one of the most effective defenses against adversarial attacks, yet …

被引用次数：4 相关文章所有 3 个版本

[PDF] aclanthology.org

A comprehensive study of jailbreak attack versus defense for large language models

Z Xu, Y Liu, G Deng, Y Li, S Picek - Findings of the Association for …, 2024 - aclanthology.org

Abstract Large Language Models (LLMs) have increasingly become central to generating
content with potential societal impacts. Notably, these models have demonstrated …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

LLM Jailbreak Attack versus Defense Techniques--A Comprehensive Study

Z Xu, Y Liu, G Deng, Y Li, S Picek - arXiv preprint arXiv:2402.13457, 2024 - arxiv.org

Large Language Models (LLMS) have increasingly become central to generating content
with potential societal impacts. Notably, these models have demonstrated capabilities for …

被引用次数：23 相关文章所有 2 个版本

[PDF] arxiv.org

Boosting jailbreak attack with momentum

Y Zhang, Z Wei - arXiv preprint arXiv:2405.01229, 2024 - arxiv.org

Large Language Models (LLMs) have achieved remarkable success across diverse tasks,
yet they remain vulnerable to adversarial attacks, notably the well-documented\textit …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Whispers in the Machine: Confidentiality in LLM-integrated Systems

J Evertz, M Chlosta, L Schönherr… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) are increasingly integrated with external tools. While these
integrations can significantly improve the functionality of LLMs, they also create a new attack …

被引用次数：11 相关文章所有 3 个版本