[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

[PDF][PDF] Tree of attacks: Jailbreaking black-box llms automatically

A Mehrotra, M Zampetakis, P Kassianik… - arXiv preprint arXiv …, 2023 - ciso2ciso.com
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of …

A strongreject for empty jailbreaks

A Souly, Q Lu, D Bowen, T Trinh, E Hsieh… - arXiv preprint arXiv …, 2024 - arxiv.org
The rise of large language models (LLMs) has drawn attention to the existence of"
jailbreaks" that allow the models to be used maliciously. However, there is no standard …

Breaking down the defenses: A comparative survey of attacks on large language models

AG Chowdhury, MM Islam, V Kumar, FH Shezan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …

Attackeval: How to evaluate the effectiveness of jailbreak attacking on large language models

M Jin, S Zhu, B Wang, Z Zhou, C Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
In our research, we pioneer a novel approach to evaluate the effectiveness of jailbreak
attacks on Large Language Models (LLMs), such as GPT-4 and LLaMa2, diverging from …

Safeguarding Large Language Models: A Survey

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM

X Yang, X Tang, S Hu, J Han - arXiv preprint arXiv:2405.05610, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable performance in various natural
language processing tasks, especially in dialogue systems. However, LLM may also pose …

Refusal in Language Models Is Mediated by a Single Direction

A Arditi, O Obeso, A Syed, D Paleka, N Rimsky… - arXiv preprint arXiv …, 2024 - arxiv.org
Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …

Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens

J Yu, H Luo, J Yao-Chieh, W Guo, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Along with the remarkable successes of Language language models, recent research also
started to explore the security threats of LLMs, including jailbreaking attacks. Attackers …