- 学术资源搜索

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

被引用次数：140 相关文章所有 11 个版本

[PDF] ciso2ciso.com

[PDF][PDF] Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis, P Kassianik… - arXiv preprint arXiv …, 2023 - ciso2ciso.com

Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of …

被引用次数：70 相关文章所有 3 个版本

[PDF] arxiv.org

A strongreject for empty jailbreaks

A Souly, Q Lu, D Bowen, T Trinh, E Hsieh… - arXiv preprint arXiv …, 2024 - arxiv.org

The rise of large language models (LLMs) has drawn attention to the existence of"
jailbreaks" that allow the models to be used maliciously. However, there is no standard …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Breaking down the defenses: A comparative survey of attacks on large language models

AG Chowdhury, MM Islam, V Kumar, FH Shezan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Attackeval: How to evaluate the effectiveness of jailbreak attacking on large language models

M Jin, S Zhu, B Wang, Z Zhou, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

In our research, we pioneer a novel approach to evaluate the effectiveness of jailbreak
attacks on Large Language Models (LLMs), such as GPT-4 and LLaMa2, diverging from …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Safeguarding Large Language Models: A Survey

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

X Yang, X Tang, S Hu, J Han - arXiv preprint arXiv:2405.05610, 2024 - arxiv.org

Large language models (LLMs) have achieved remarkable performance in various natural
language processing tasks, especially in dialogue systems. However, LLM may also pose …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Refusal in Language Models Is Mediated by a Single Direction

A Arditi, O Obeso, A Syed, D Paleka, N Rimsky… - arXiv preprint arXiv …, 2024 - arxiv.org

Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …

被引用次数：3 相关文章

[PDF] arxiv.org

Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens

J Yu, H Luo, J Yao-Chieh, W Guo, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Along with the remarkable successes of Language language models, recent research also
started to explore the security threats of LLMs, including jailbreaking attacks. Attackers …