[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

Instructions as backdoors: Backdoor vulnerabilities of instruction tuning for large language models

J Xu, MD Ma, F Wang, C Xiao, M Chen - arXiv preprint arXiv:2305.14710, 2023 - arxiv.org
We investigate security concerns of the emergent instruction tuning paradigm, that models
are trained on crowdsourced datasets with task instructions to achieve superior …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Cognitive overload: Jailbreaking large language models with overloaded logical thinking

N Xu, F Wang, B Zhou, BZ Li, C Xiao… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) have demonstrated increasing power, they have also
given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can …

Learning to poison large language models during instruction tuning

Y Qiang, X Zhou, SZ Zade, MA Roshani… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) has marked significant achievements in
language processing and reasoning capabilities. Despite their advancements, LLMs face …

Hijacking large language models via adversarial in-context learning

Y Qiang - 2024 - search.proquest.com
In-context learning (ICL) has emerged as a powerful paradigm leveraging LLMs for specific
downstream tasks by utilizing labeled examples as demonstrations in the precondition …

Mitigating backdoor threats to large language models: Advancement and challenges

Q Liu, W Mo, T Tong, J Xu, F Wang… - 2024 60th Annual …, 2024 - ieeexplore.ieee.org
The advancement of Large Language Models (LLMs) has significantly impacted various
domains, including Web search, healthcare, and software development. However, as these …

Rethinking Backdoor Detection Evaluation for Language Models

J Yan, WJ Mo, X Ren, R Jia - arXiv preprint arXiv:2409.00399, 2024 - arxiv.org
Backdoor attacks, in which a model behaves maliciously when given an attacker-specified
trigger, pose a major security risk for practitioners who depend on publicly released …

Combating Security and Privacy Issues in the Era of Large Language Models

M Chen, C Xiao, H Sun, L Li, L Derczynski… - Proceedings of the …, 2024 - aclanthology.org
This tutorial seeks to provide a systematic summary of risks and vulnerabilities in security,
privacy and copyright aspects of large language models (LLMs), and most recent solutions …