Universal and transferable adversarial attacks on aligned language models

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com

This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

被引用次数：98 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

被引用次数：238 相关文章所有 11 个版本

[PDF] arxiv.org

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org

While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

被引用次数：598 相关文章所有 2 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：466 相关文章所有 4 个版本

[PDF] arxiv.org

Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal… - arXiv preprint arXiv …, 2023 - arxiv.org

Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

被引用次数：245 相关文章所有 4 个版本

[PDF] arxiv.org

Large language models cannot self-correct reasoning yet

J Huang, X Chen, S Mishra, HS Zheng, AW Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …

被引用次数：171 相关文章所有 3 个版本

[PDF] arxiv.org

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

J Yu, X Lin, Z Yu, X Xing - arXiv preprint arXiv:2309.10253, 2023 - arxiv.org

Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …

被引用次数：146 相关文章所有 2 个版本

[PDF] arxiv.org

Scalable extraction of training data from (production) language models

M Nasr, N Carlini, J Hayase, M Jagielski… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper studies extractable memorization: training data that an adversary can efficiently
extract by querying a machine learning model without prior knowledge of the training …

被引用次数：173 相关文章所有 8 个版本

[PDF] arxiv.org

Catastrophic jailbreak of open-source llms via exploiting generation

Y Huang, S Gupta, M Xia, K Li, D Chen - arXiv preprint arXiv:2310.06987, 2023 - arxiv.org

The rapid progress in open-source large language models (LLMs) is significantly advancing
AI development. Extensive efforts have been made before model release to align their …

被引用次数：144 相关文章所有 3 个版本

[PDF] authorea.com

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, Q Al Tashi, A Shah, R Qureshi… - Authorea …, 2024 - authorea.com

Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …

被引用次数：128 相关文章所有 5 个版本