Benchmarking and defending against indirect prompt injection attacks on large language models

J Liu, P Zhou, Y Hua, D Chong, Z Tian… - Advances in …, 2024 - proceedings.neurips.cc

Recent advancements in large language models (LLMs) have transformed the field of
question answering (QA). However, evaluating LLMs in the medical field is challenging due …

被引用次数：44 相关文章所有 6 个版本

[PDF] arxiv.org

The instruction hierarchy: Training llms to prioritize privileged instructions

E Wallace, K Xiao, R Leike, L Weng, J Heidecke… - arXiv preprint arXiv …, 2024 - arxiv.org

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow
adversaries to overwrite a model's original instructions with their own malicious prompts. In …

被引用次数：36 相关文章所有 2 个版本

[PDF] arxiv.org

StruQ: Defending against prompt injection with structured queries

S Chen, J Piet, C Sitawarin, D Wagner - arXiv preprint arXiv:2402.06363, 2024 - arxiv.org

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated
applications, which perform text-based tasks by utilizing their advanced language …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

A new era in llm security: Exploring security concerns in real-world llm-based systems

F Wu, N Zhang, S Jha, P McDaniel, C Xiao - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Model (LLM) systems are inherently compositional, with individual LLM
serving as the core foundation with additional layers of objects such as plugins, sandbox …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

R-judge: Benchmarking safety risk awareness for llm agents

T Yuan, Z He, L Dong, Y Wang, R Zhao, T Xia… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have exhibited great potential in autonomously completing
tasks across real-world applications. Despite this, these LLM agents introduce unexpected …

被引用次数：19 相关文章所有 4 个版本

[PDF] arxiv.org

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

R Pi, T Han, Y Xie, R Pan, Q Lian, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org

The deployment of multimodal large language models (MLLMs) has brought forth a unique
vulnerability: susceptibility to malicious attacks through visual inputs. We delve into the novel …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Q Zhan, Z Liang, Z Ying, D Kang - arXiv preprint arXiv:2403.02691, 2024 - arxiv.org

Recent work has embodied LLMs as agents, allowing them to access tools, perform actions,
and interact with external content (eg, emails or websites). However, external content …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Prioritizing safeguarding over autonomy: Risks of llm agents for science

X Tang, Q Jin, K Zhu, T Yuan, Y Zhang, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Intelligent agents powered by large language models (LLMs) have demonstrated substantial
promise in autonomously conducting experiments and facilitating scientific discoveries …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Llm agents can autonomously exploit one-day vulnerabilities

R Fang, R Bindu, A Gupta, D Kang - arXiv preprint arXiv:2404.08144, 2024 - arxiv.org

LLMs have becoming increasingly powerful, both in their benign and malicious uses. With
the increase in capabilities, researchers have been increasingly interested in their ability to …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Strengthening multimodal large language model with bootstrapped preference optimization

R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) excel in generating responses based on
visual inputs. However, they often suffer from a bias towards generating responses similar to …

被引用次数：9 相关文章所有 2 个版本