所有版本 - 学术资源搜索

Universal and transferable adversarial attacks on aligned language models

A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter… - arXiv preprint arXiv …, 2023 - arxiv.org

Because" out-of-the-box" large language models are capable of generating a great deal of
objectionable content, recent work has focused on aligning these models in an attempt to …

被引用次数：662 相关文章

[PDF] llm-attacks.org

[PDF][PDF] Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter, M Fredrikson - 2023 - llm-attacks.org

Abstract Because “out-of-the-box” large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …

[PDF] radensa.ru

[PDF][PDF] Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter, M Fredrikson - 2023 - newsletter.radensa.ru

Abstract Because “out-of-the-box” large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …

Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, J Zico Kolter, M Fredrikson - arXiv e-prints, 2023 - ui.adsabs.harvard.edu

Abstract Because" out-of-the-box" large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …

[PDF] future4200.com

[PDF][PDF] Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter, M Fredrikson - 2023 - future4200.com

Abstract Because “out-of-the-box” large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …

[PDF] jordan.im

[PDF][PDF] Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter, M Fredrikson - arXiv preprint arXiv:2307.15043, 2023 - r.jordan.im

Abstract Because “out-of-the-box” large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …

Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter, M Fredrikson - surrealyz.github.io

Universal and Transferable Adversarial Attacks on Aligned Language Models Page 1
Universal and Transferable Adversarial Attacks on Aligned Language Models Andy Zou, Zifan …

[PDF] isc2.org

[PDF][PDF] Universal and Transferable Adversarial Attacks on Aligned Language Models

A Zou, Z Wang, JZ Kolter… - arXiv preprint arXiv …, 2023 - community.isc2.org

Abstract Because “out-of-the-box” large language models are capable of generating a great
deal of objectionable content, recent work has focused on aligning these models in an …