Benchmarking large language models on cmexam-a comprehensive chinese medical exam dataset

J Liu, P Zhou, Y Hua, D Chong, Z Tian… - Advances in …, 2024 - proceedings.neurips.cc
Recent advancements in large language models (LLMs) have transformed the field of
question answering (QA). However, evaluating LLMs in the medical field is challenging due …

The instruction hierarchy: Training llms to prioritize privileged instructions

E Wallace, K Xiao, R Leike, L Weng, J Heidecke… - arXiv preprint arXiv …, 2024 - arxiv.org
Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow
adversaries to overwrite a model's original instructions with their own malicious prompts. In …

StruQ: Defending against prompt injection with structured queries

S Chen, J Piet, C Sitawarin, D Wagner - arXiv preprint arXiv:2402.06363, 2024 - arxiv.org
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated
applications, which perform text-based tasks by utilizing their advanced language …

A new era in llm security: Exploring security concerns in real-world llm-based systems

F Wu, N Zhang, S Jha, P McDaniel, C Xiao - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Model (LLM) systems are inherently compositional, with individual LLM
serving as the core foundation with additional layers of objects such as plugins, sandbox …

R-judge: Benchmarking safety risk awareness for llm agents

T Yuan, Z He, L Dong, Y Wang, R Zhao, T Xia… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have exhibited great potential in autonomously completing
tasks across real-world applications. Despite this, these LLM agents introduce unexpected …

MLLM-Protector: Ensuring MLLM's Safety without Hurting Performance

R Pi, T Han, Y Xie, R Pan, Q Lian, H Dong… - arXiv preprint arXiv …, 2024 - arxiv.org
The deployment of multimodal large language models (MLLMs) has brought forth a unique
vulnerability: susceptibility to malicious attacks through visual inputs. We delve into the novel …

Injecagent: Benchmarking indirect prompt injections in tool-integrated large language model agents

Q Zhan, Z Liang, Z Ying, D Kang - arXiv preprint arXiv:2403.02691, 2024 - arxiv.org
Recent work has embodied LLMs as agents, allowing them to access tools, perform actions,
and interact with external content (eg, emails or websites). However, external content …

Prioritizing safeguarding over autonomy: Risks of llm agents for science

X Tang, Q Jin, K Zhu, T Yuan, Y Zhang, W Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Intelligent agents powered by large language models (LLMs) have demonstrated substantial
promise in autonomously conducting experiments and facilitating scientific discoveries …

Llm agents can autonomously exploit one-day vulnerabilities

R Fang, R Bindu, A Gupta, D Kang - arXiv preprint arXiv:2404.08144, 2024 - arxiv.org
LLMs have becoming increasingly powerful, both in their benign and malicious uses. With
the increase in capabilities, researchers have been increasingly interested in their ability to …

Strengthening multimodal large language model with bootstrapped preference optimization

R Pi, T Han, W Xiong, J Zhang, R Liu, R Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) excel in generating responses based on
visual inputs. However, they often suffer from a bias towards generating responses similar to …