A holistic approach to undesired content detection in the real world

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier

Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

被引用次数：77 相关文章所有 5 个版本

[PDF] arxiv.org

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

被引用次数：56 相关文章所有 2 个版本

[PDF] arxiv.org

Gpt-4 technical report

J Achiam, S Adler, S Agarwal, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org

We report the development of GPT-4, a large-scale, multimodal model which can accept
image and text inputs and produce text outputs. While less capable than humans in many …

被引用次数：2570 相关文章所有 3 个版本

[PDF] arxiv.org

Multi-step jailbreaking privacy attacks on chatgpt

H Li, D Guo, W Fan, M Xu, J Huang, F Meng… - arXiv preprint arXiv …, 2023 - arxiv.org

With the rapid progress of large language models (LLMs), many downstream NLP tasks can
be well solved given appropriate prompts. Though model developers and researchers work …

被引用次数：220 相关文章所有 5 个版本

[PDF] arxiv.org

" do anything now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models

X Shen, Z Chen, M Backes, Y Shen… - arXiv preprint arXiv …, 2023 - arxiv.org

The misuse of large language models (LLMs) has garnered significant attention from the
general public and LLM vendors. In response, efforts have been made to align LLMs with …

被引用次数：208 相关文章所有 2 个版本

[PDF] arxiv.org

Exploiting programmatic behavior of llms: Dual-use through standard security attacks

D Kang, X Li, I Stoica, C Guestrin… - 2024 IEEE Security …, 2024 - ieeexplore.ieee.org

Recent advances in instruction-following large language models (LLMs) have led to
dramatic improvements in a range of NLP tasks. Unfortunately, we find that the same …

被引用次数：145 相关文章所有 3 个版本

[PDF] arxiv.org

Ignore previous prompt: Attack techniques for language models

F Perez, I Ribeiro - arXiv preprint arXiv:2211.09527, 2022 - arxiv.org

Transformer-based large language models (LLMs) provide a powerful foundation for natural
language tasks in large-scale customer-facing applications. However, studies that explore …

被引用次数：200 相关文章所有 3 个版本

[PDF] arxiv.org

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

J Yu, X Lin, X Xing - arXiv preprint arXiv:2309.10253, 2023 - arxiv.org

Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …

被引用次数：114 相关文章所有 2 个版本

[PDF] arxiv.org

Multilingual jailbreak challenges in large language models

Y Deng, W Zhang, SJ Pan, L Bing - arXiv preprint arXiv:2310.06474, 2023 - arxiv.org

While large language models (LLMs) exhibit remarkable capabilities across a wide range of
tasks, they pose potential safety concerns, such as the``jailbreak''problem, wherein …

被引用次数：85 相关文章所有 5 个版本

[PDF] arxiv.org

Llama guard: Llm-based input-output safeguard for human-ai conversations

H Inan, K Upasani, J Chi, R Rungta, K Iyer… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Llama Guard, an LLM-based input-output safeguard model geared towards
Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a …

被引用次数：102 相关文章所有 2 个版本