Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arXiv preprint arXiv …, 2023 - arxiv.org
… these attacks have required significant human ingenuity and are brittle in practice. Attempts
at automatic adversarial … and effective attack method that causes aligned language models to …

Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models

D Lu, Z Wang, T Wang, W Guan… - Proceedings of the …, 2023 - openaccess.thecvf.com
… of generalizable adversarial examples, we propose using set-level alignmentpreserving …
Accelerating vision-language pretraining with free language modeling. In Proceedings of …

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

PF Zhang, Z Huang, G Bai - Proceedings of the 47th International ACM …, 2024 - dl.acm.org
… Vision-language models form the cornerstone of a wide … proposed to learn aligned VLP
models that generate embeddings … Effective and Transferable Universal Adversarial Attack (ETU…

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2024 - proceedings.neurips.cc
attacks are simply not powerful enough to distinguish between robust and non-robust defenses:
even when we guarantee that an adversarial input on the language model … will transfer

Automatic hallucination assessment for aligned large language models via transferable adversarial attacks

X Yu, H Cheng, X Liu, D Roth, J Gao - arXiv preprint arXiv:2310.12516, 2023 - arxiv.org
… to use prompting chaining to generate transferable adversarial attacks in the form of question-…
Finally, we find that the adversarial examples generated by our method are transferable

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

V Subhash, A Bialas, W Pan… - … Frontiers in Adversarial …, 2023 - openreview.net
… Triggers are seen to transfer across models. We observe that this transferability occurs
across a number of models using the same tokenization algorithm. This behavior has also been …

Transferable multimodal attack on vision-language pre-training models

H Wang, K Dong, Z Zhu, H Qin, A Liu, X Fang… - 2024 IEEE Symposium …, 2024 - computer.org
… Considering that VLP models rely more on aligned … [58],which are universal adversarial attack
defense methods.For … Tang, “Glm:General language model pretraining with autoregressive …

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

H Wang, H Li, M Huang, L Sha - arXiv preprint arXiv:2402.16006, 2024 - arxiv.org
model to align the embedding dimension of the target model (d2… universal adversarial suffixes
while preserving the ability of … transferable adversarial suffixes to attack black box models, …

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

H Fang, J Kong, W Yu, B Chen, J Li, S Xia… - arXiv preprint arXiv …, 2024 - arxiv.org
… on multiple Large vision-language Models (LVLM), such as … , ie, the transferable adversarial
attack which involves … we align with TMM [14] and consider some universal defense …

Adversarial attacks on deep-learning models in natural language processing: A survey

WE Zhang, QZ Sheng, A Alhazmi, C Li - ACM Transactions on Intelligent …, 2020 - dl.acm.org
… fluent and effective adversarial attacks [155]. MHA is based on language model and
Metropolis… model, but is also highly transferable to another model Show-Attend-and-Tell [147]. …