StruQ: Defending against prompt injection with structured queries

S Chen, J Piet, C Sitawarin, D Wagner - arXiv preprint arXiv:2402.06363, 2024 - arxiv.org
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated
applications, which perform text-based tasks by utilizing their advanced language …

When llms meet cybersecurity: A systematic literature review

J Zhang, H Bu, H Wen, Y Chen, L Li, H Zhu - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancements in large language models (LLMs) have opened new avenues
across various fields, including cybersecurity, which faces an ever-evolving threat landscape …

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Q Zhan, Z Liang, Z Ying, D Kang - arXiv preprint arXiv:2403.02691, 2024 - arxiv.org
Recent work has embodied LLMs as agents, allowing them to access tools, perform actions,
and interact with external content (eg, emails or websites). However, external content …

Prioritizing safeguarding over autonomy: Risks of llm agents for science

X Tang, Q Jin, K Zhu, T Yuan, Y Zhang, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Intelligent agents powered by large language models (LLMs) have demonstrated substantial
promise in autonomously conducting experiments and facilitating scientific discoveries …

On the duality between sharpness-aware minimization and adversarial training

Y Zhang, H He, J Zhu, H Chen, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Adversarial Training (AT), which adversarially perturb the input samples during training, has
been acknowledged as one of the most effective defenses against adversarial attacks, yet …

A comprehensive study of jailbreak attack versus defense for large language models

Z Xu, Y Liu, G Deng, Y Li, S Picek - Findings of the Association for …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have increasingly become central to generating
content with potential societal impacts. Notably, these models have demonstrated …

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

LLM Jailbreak Attack versus Defense Techniques--A Comprehensive Study

Z Xu, Y Liu, G Deng, Y Li, S Picek - arXiv preprint arXiv:2402.13457, 2024 - arxiv.org
Large Language Models (LLMS) have increasingly become central to generating content
with potential societal impacts. Notably, these models have demonstrated capabilities for …

Boosting jailbreak attack with momentum

Y Zhang, Z Wei - arXiv preprint arXiv:2405.01229, 2024 - arxiv.org
Large Language Models (LLMs) have achieved remarkable success across diverse tasks,
yet they remain vulnerable to adversarial attacks, notably the well-documented\textit …

Whispers in the Machine: Confidentiality in LLM-integrated Systems

J Evertz, M Chlosta, L Schönherr… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are increasingly integrated with external tools. While these
integrations can significantly improve the functionality of LLMs, they also create a new attack …