Universal and transferable adversarial attacks on aligned language models- 学术资源搜索

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arXiv preprint arXiv …, 2023 - arxiv.org

… these attacks have required significant human ingenuity and are brittle in practice. Attempts
at automatic adversarial … and effective attack method that causes aligned language models to …

被引用次数：648 相关文章所有 8 个版本

[PDF] thecvf.com

Set-level guidance attack: Boosting adversarial transferability of vision-language pre-training models

D Lu, Z Wang, T Wang, W Guan… - Proceedings of the …, 2023 - openaccess.thecvf.com

… of generalizable adversarial examples, we propose using set-level alignmentpreserving …
Accelerating vision-language pretraining with free language modeling. In Proceedings of …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

Universal Adversarial Perturbations for Vision-Language Pre-trained Models

PF Zhang, Z Huang, G Bai - Proceedings of the 47th International ACM …, 2024 - dl.acm.org

… Vision-language models form the cornerstone of a wide … proposed to learn aligned VLP
models that generate embeddings … Effective and Transferable Universal Adversarial Attack (ETU…

被引用次数：1 相关文章所有 2 个版本

[PDF] neurips.cc

Are aligned neural networks adversarially aligned?

N Carlini, M Nasr… - Advances in …, 2024 - proceedings.neurips.cc

… attacks are simply not powerful enough to distinguish between robust and non-robust defenses:
even when we guarantee that an adversarial input on the language model … will transfer …

被引用次数：173 相关文章所有 6 个版本

[PDF] arxiv.org

Automatic hallucination assessment for aligned large language models via transferable adversarial attacks

X Yu, H Cheng, X Liu, D Roth, J Gao - arXiv preprint arXiv:2310.12516, 2023 - arxiv.org

… to use prompting chaining to generate transferable adversarial attacks in the form of question-…
Finally, we find that the adversarial examples generated by our method are transferable …

被引用次数：11 相关文章所有 4 个版本

[PDF] openreview.net

Why do universal adversarial attacks work on large language models?: Geometry might be the answer

V Subhash, A Bialas, W Pan… - … Frontiers in Adversarial …, 2023 - openreview.net

… Triggers are seen to transfer across models. We observe that this transferability occurs
across a number of models using the same tokenization algorithm. This behavior has also been …

被引用次数：4 相关文章所有 3 个版本

Transferable multimodal attack on vision-language pre-training models

H Wang, K Dong, Z Zhu, H Qin, A Liu, X Fang… - 2024 IEEE Symposium …, 2024 - computer.org

… Considering that VLP models rely more on aligned … [58],which are universal adversarial attack
defense methods.For … Tang, “Glm:General language model pretraining with autoregressive …

被引用次数：11 相关文章

[PDF] arxiv.org

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

H Wang, H Li, M Huang, L Sha - arXiv preprint arXiv:2402.16006, 2024 - arxiv.org

… model to align the embedding dimension of the target model (d2… universal adversarial suffixes
while preserving the ability of … transferable adversarial suffixes to attack black box models, …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

H Fang, J Kong, W Yu, B Chen, J Li, S Xia… - arXiv preprint arXiv …, 2024 - arxiv.org

… on multiple Large vision-language Models (LVLM), such as … , ie, the transferable adversarial
attack which involves … we align with TMM [14] and consider some universal defense …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Adversarial attacks on deep-learning models in natural language processing: A survey

WE Zhang, QZ Sheng, A Alhazmi, C Li - ACM Transactions on Intelligent …, 2020 - dl.acm.org

… fluent and effective adversarial attacks [155]. MHA is based on language model and
Metropolis… model, but is also highly transferable to another model Show-Attend-and-Tell [147]. …

被引用次数：689 相关文章所有 6 个版本