[HTML][HTML] AI deception: A survey of examples, risks, and potential solutions

PS Park, S Goldstein, A O'Gara, M Chen, D Hendrycks - Patterns, 2024 - cell.com
This paper argues that a range of current AI systems have learned how to deceive humans.
We define deception as the systematic inducement of false beliefs in the pursuit of some …

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org
For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal… - arXiv preprint arXiv …, 2023 - arxiv.org
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

Large language models cannot self-correct reasoning yet

J Huang, X Chen, S Mishra, HS Zheng, AW Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have emerged as a groundbreaking technology with their
unparalleled text generation capabilities across various applications. Nevertheless …

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

J Yu, X Lin, Z Yu, X Xing - arXiv preprint arXiv:2309.10253, 2023 - arxiv.org
Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …

Scalable extraction of training data from (production) language models

M Nasr, N Carlini, J Hayase, M Jagielski… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper studies extractable memorization: training data that an adversary can efficiently
extract by querying a machine learning model without prior knowledge of the training …

Catastrophic jailbreak of open-source llms via exploiting generation

Y Huang, S Gupta, M Xia, K Li, D Chen - arXiv preprint arXiv:2310.06987, 2023 - arxiv.org
The rapid progress in open-source large language models (LLMs) is significantly advancing
AI development. Extensive efforts have been made before model release to align their …

Large language models: a comprehensive survey of its applications, challenges, limitations, and future prospects

MU Hadi, Q Al Tashi, A Shah, R Qureshi… - Authorea …, 2024 - authorea.com
Within the vast expanse of computerized language processing, a revolutionary entity known
as Large Language Models (LLMs) has emerged, wielding immense power in its capacity to …