From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

H Wang, H Li, M Huang, L Sha - arXiv preprint arXiv:2402.16006, 2024 - arxiv.org
The safety defense methods of Large language models (LLMs) stays limited because the
dangerous prompts are manually curated to just few known attack types, which fails to keep …

From Noise to Clarity: Unraveling the Adversarial Suffix of Large Language Model Attacks via Translation of Text Embeddings

H Wang, H Li, M Huang, L Sha - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
The safety defense methods of Large language models (LLMs) stays limited because the
dangerous prompts are manually curated to just few known attack types, which fails to keep …