Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable progress in natural language processing. These models are trained on …
made remarkable progress in natural language processing. These models are trained on …
Open problems in technical ai governance
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …
they should be navigated. In many cases, the barriers and uncertainties faced are at least …
Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning
J Hu, Y Dong, X Huang - arXiv preprint arXiv:2408.08959, 2024 - arxiv.org
Guardrails have become an integral part of Large language models (LLMs), by moderating
harmful or toxic response in order to maintain LLMs' alignment to human expectations …
harmful or toxic response in order to maintain LLMs' alignment to human expectations …
Knowledge Return Oriented Prompting (KROP)
J Martin, K Yeung - arXiv preprint arXiv:2406.11880, 2024 - arxiv.org
Many Large Language Models (LLMs) and LLM-powered apps deployed today use some
form of prompt filter or alignment to protect their integrity. However, these measures aren't …
form of prompt filter or alignment to protect their integrity. However, these measures aren't …