[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

A comprehensive survey of forgetting in deep learning beyond continual learning

Z Wang, E Yang, L Shen, H Huang - arXiv preprint arXiv:2307.09218, 2023 - arxiv.org
Forgetting refers to the loss or deterioration of previously acquired information or knowledge.
While the existing surveys on forgetting have primarily focused on continual learning …

Investigating the catastrophic forgetting in multimodal large language models

Y Zhai, S Tong, X Li, M Cai, Q Qu, YJ Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
Following the success of GPT4, there has been a surge in interest in multimodal large
language model (MLLM) research. This line of research focuses on developing general …

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

S Zhao, J Wen, LA Tuan, J Zhao, J Fu - arXiv preprint arXiv:2305.01219, 2023 - arxiv.org
The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …

A comprehensive study on robustness of image classification models: Benchmarking and rethinking

C Liu, Y Dong, W Xiang, X Yang, H Su, J Zhu… - International Journal of …, 2024 - Springer
The robustness of deep neural networks is frequently compromised when faced with
adversarial examples, common corruptions, and distribution shifts, posing a significant …

Certified robustness against natural language attacks by causal intervention

H Zhao, C Ma, X Dong, AT Luu… - International …, 2022 - proceedings.mlr.press
Deep learning models have achieved great success in many fields, yet they are vulnerable
to adversarial examples. This paper follows a causal perspective to look into the adversarial …

Flirt: Feedback loop in-context red teaming

N Mehrabi, P Goyal, C Dupuy, Q Hu, S Ghosh… - arXiv preprint arXiv …, 2023 - arxiv.org
Warning: this paper contains content that may be inappropriate or offensive. As generative
models become available for public use in various applications, testing and analyzing …

Pre-trained adversarial perturbations

Y Ban, Y Dong - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Self-supervised pre-training has drawn increasing attention in recent years due to its
superior performance on numerous downstream tasks after fine-tuning. However, it is well …

Robustness analysis of video-language models against visual and language perturbations

M Schiappa, S Vyas, H Palangi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Joint visual and language modeling on large-scale datasets has recently shown good
progress in multi-modal tasks when compared to single modal learning. However …

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …