Attackeval: How to evaluate the effectiveness of jailbreak attacking on large language models

Y Deldjoo, Z He, J McAuley, A Korikov… - Proceedings of the 30th …, 2024 - dl.acm.org

Traditional recommender systems typically use user-item rating histories as their main data
source. However, deep generative models now have the capability to model and sample …

被引用次数：34 相关文章所有 3 个版本

[PDF] arxiv.org

Jailbreak attacks and defenses against large language models: A survey

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

被引用次数：27 相关文章所有 4 个版本

[PDF] arxiv.org

Breaking down the defenses: A comparative survey of attacks on large language models

AG Chowdhury, MM Islam, V Kumar, FH Shezan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Promptcrypt: Prompt encryption for secure communication with large language models

G Lin, W Hua, Y Zhang - arXiv preprint arXiv:2402.05868, 2024 - arxiv.org

Cloud-based large language models (LLMs) such as ChatGPT have increasingly become
integral to daily operations, serving as vital tools across various applications. While these …

被引用次数：9 相关文章所有 2 个版本

[PDF] arxiv.org

Why ai is weird and should not be this way: Towards ai for everyone, with everyone, by everyone

R Mihalcea, O Ignat, L Bai, A Borah, L Chiruzzo… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper presents a vision for creating AI systems that are inclusive at every stage of
development, from data collection to model design and evaluation. We address key …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt

Z Ying, A Liu, T Zhang, Z Yu, S Liang, X Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the realm of large vision language models (LVLMs), jailbreak attacks serve as a red-
teaming approach to bypass guardrails and uncover safety implications. Existing jailbreaks …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Jailbreaking llms with arabic transliteration and arabizi

MA Ghanim, S Almohaimeed, M Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org

This study identifies the potential vulnerabilities of Large Language Models (LLMs)
to'jailbreak'attacks, specifically focusing on the Arabic language and its various forms. While …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

JailbreakEval: An Integrated Toolkit for Evaluating Jailbreak Attempts Against Large Language Models

D Ran, J Liu, Y Gong, J Zheng, X He, T Cong… - arXiv preprint arXiv …, 2024 - arxiv.org

Jailbreak attacks aim to induce Large Language Models (LLMs) to generate harmful
responses for forbidden instructions, presenting severe misuse threats to LLMs. Up to now …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

JAILJUDGE: A Comprehensive Jailbreak Judge Benchmark with Multi-Agent Enhanced Explanation Evaluation Framework

F Liu, Y Feng, Z Xu, L Su, X Ma, D Yin, H Liu - arXiv preprint arXiv …, 2024 - arxiv.org

Despite advancements in enhancing LLM safety against jailbreak attacks, evaluating LLM
defenses remains a challenge, with current methods often lacking explainability and …