[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …
natural language understanding and generation. They possess deep language …
[PDF][PDF] Tree of attacks: Jailbreaking black-box llms automatically
Abstract While Large Language Models (LLMs) display versatile functionality, they continue
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of …
to generate harmful, biased, and toxic content, as demonstrated by the prevalence of …
A strongreject for empty jailbreaks
The rise of large language models (LLMs) has drawn attention to the existence of"
jailbreaks" that allow the models to be used maliciously. However, there is no standard …
jailbreaks" that allow the models to be used maliciously. However, there is no standard …
Breaking down the defenses: A comparative survey of attacks on large language models
Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …
Language Processing (NLP), offering transformative capabilities in understanding and …
Attackeval: How to evaluate the effectiveness of jailbreak attacking on large language models
In our research, we pioneer a novel approach to evaluate the effectiveness of jailbreak
attacks on Large Language Models (LLMs), such as GPT-4 and LLaMa2, diverging from …
attacks on Large Language Models (LLMs), such as GPT-4 and LLaMa2, diverging from …
Safeguarding Large Language Models: A Survey
In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …
Chain of Attack: a Semantic-Driven Contextual Multi-Turn attacker for LLM
X Yang, X Tang, S Hu, J Han - arXiv preprint arXiv:2405.05610, 2024 - arxiv.org
Large language models (LLMs) have achieved remarkable performance in various natural
language processing tasks, especially in dialogue systems. However, LLM may also pose …
language processing tasks, especially in dialogue systems. However, LLM may also pose …
Refusal in Language Models Is Mediated by a Single Direction
Conversational large language models are fine-tuned for both instruction-following and
safety, resulting in models that obey benign requests but refuse harmful ones. While this …
safety, resulting in models that obey benign requests but refuse harmful ones. While this …
Enhancing Jailbreak Attack Against Large Language Models through Silent Tokens
Along with the remarkable successes of Language language models, recent research also
started to explore the security threats of LLMs, including jailbreaking attacks. Attackers …
started to explore the security threats of LLMs, including jailbreaking attacks. Attackers …