Safeguarding Large Language Models: A Survey

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Safeguarding Large Language Models: A Survey

在引用文章中搜索

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Unique Security and Privacy Threats of Large Language Model: A Comprehensive Survey

S Wang, T Zhu, B Liu, D Ming, X Guo, D Ye… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable progress in natural language processing. These models are trained on …

被引用次数：1 相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Open problems in technical ai governance

A Reuel, B Bucknall, S Casper, T Fist, L Soder… - arXiv preprint arXiv …, 2024 - arxiv.org

AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …

被引用次数：2 相关文章所有 3 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning

J Hu, Y Dong, X Huang - arXiv preprint arXiv:2408.08959, 2024 - arxiv.org

Guardrails have become an integral part of Large language models (LLMs), by moderating
harmful or toxic response in order to maintain LLMs' alignment to human expectations …

相关文章所有 2 个版本网页快照

[PDF] sci-hub [PDF] arxiv.org [ 下载加速 ]

Knowledge Return Oriented Prompting (KROP)

J Martin, K Yeung - arXiv preprint arXiv:2406.11880, 2024 - arxiv.org

Many Large Language Models (LLMs) and LLM-powered apps deployed today use some
form of prompt filter or alignment to protect their integrity. However, these measures aren't …