Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

S Wang, T Zhu, B Liu, D Ming, X Guo, D Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable progress in natural language processing. These models are trained on …

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arXiv preprint arXiv …, 2024 - arxiv.org
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning

J Hu, Y Dong, X Huang - arXiv preprint arXiv:2408.08959, 2024 - arxiv.org
Guardrails have become an integral part of Large language models (LLMs), by moderating
harmful or toxic response in order to maintain LLMs' alignment to human expectations …

Knowledge Return Oriented Prompting (KROP)

J Martin, K Yeung - arXiv preprint arXiv:2406.11880, 2024 - arxiv.org
Many Large Language Models (LLMs) and LLM-powered apps deployed today use some
form of prompt filter or alignment to protect their integrity. However, these measures aren't …