How should pre-trained language models be fine-tuned towards adversarial robustness?

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

被引用次数：210 相关文章所有 11 个版本

[PDF] arxiv.org

A comprehensive survey of forgetting in deep learning beyond continual learning

Z Wang, E Yang, L Shen, H Huang - arXiv preprint arXiv:2307.09218, 2023 - arxiv.org

Forgetting refers to the loss or deterioration of previously acquired information or knowledge.
While the existing surveys on forgetting have primarily focused on continual learning …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Investigating the catastrophic forgetting in multimodal large language models

Y Zhai, S Tong, X Li, M Cai, Q Qu, YJ Lee… - arXiv preprint arXiv …, 2023 - arxiv.org

Following the success of GPT4, there has been a surge in interest in multimodal large
language model (MLLM) research. This line of research focuses on developing general …

被引用次数：60 相关文章所有 3 个版本

[PDF] arxiv.org

Prompt as triggers for backdoor attack: Examining the vulnerability in language models

S Zhao, J Wen, LA Tuan, J Zhao, J Fu - arXiv preprint arXiv:2305.01219, 2023 - arxiv.org

The prompt-based learning paradigm, which bridges the gap between pre-training and fine-
tuning, achieves state-of-the-art performance on several NLP tasks, particularly in few-shot …

被引用次数：50 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive study on robustness of image classification models: Benchmarking and rethinking

C Liu, Y Dong, W Xiang, X Yang, H Su, J Zhu… - International Journal of …, 2024 - Springer

The robustness of deep neural networks is frequently compromised when faced with
adversarial examples, common corruptions, and distribution shifts, posing a significant …

被引用次数：42 相关文章所有 2 个版本

[PDF] mlr.press

Certified robustness against natural language attacks by causal intervention

H Zhao, C Ma, X Dong, AT Luu… - International …, 2022 - proceedings.mlr.press

Deep learning models have achieved great success in many fields, yet they are vulnerable
to adversarial examples. This paper follows a causal perspective to look into the adversarial …

被引用次数：29 相关文章所有 5 个版本

[PDF] arxiv.org

Flirt: Feedback loop in-context red teaming

N Mehrabi, P Goyal, C Dupuy, Q Hu, S Ghosh… - arXiv preprint arXiv …, 2023 - arxiv.org

Warning: this paper contains content that may be inappropriate or offensive. As generative
models become available for public use in various applications, testing and analyzing …

被引用次数：28 相关文章所有 3 个版本

[PDF] neurips.cc

Pre-trained adversarial perturbations

Y Ban, Y Dong - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

Self-supervised pre-training has drawn increasing attention in recent years due to its
superior performance on numerous downstream tasks after fine-tuning. However, it is well …

被引用次数：17 相关文章所有 5 个版本

[PDF] neurips.cc

Robustness analysis of video-language models against visual and language perturbations

M Schiappa, S Vyas, H Palangi… - Advances in Neural …, 2022 - proceedings.neurips.cc

Joint visual and language modeling on large-scale datasets has recently shown good
progress in multi-modal tasks when compared to single modal learning. However …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

被引用次数：6 相关文章所有 3 个版本