- 学术资源搜索

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：232 相关文章所有 4 个版本

[HTML] mlr.press

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press

Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

被引用次数：32 相关文章

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Unique security and privacy threats of large language model: A comprehensive survey

S Wang, T Zhu, B Liu, M Ding, X Guo, D Ye… - arXiv preprint arXiv …, 2024 - arxiv.org

With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable advancements in natural language processing. These models are trained …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Latent guard: a safety framework for text-to-image generation

R Liu, A Khakzar, J Gu, Q Chen, P Torr… - European Conference on …, 2025 - Springer

With the ability to generate high-quality images, text-to-image (T2I) models can be exploited
for creating inappropriate content. To prevent misuse, existing safety measures are either …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Attacks and defenses for generative diffusion models: A comprehensive survey

VT Truong, LB Dang, LB Le - arXiv preprint arXiv:2408.03400, 2024 - arxiv.org

Diffusion models (DMs) have achieved state-of-the-art performance on various generative
tasks such as image synthesis, text-to-image, and text-guided image-to-image generation …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Jailbreaking prompt attack: A controllable adversarial attack against diffusion models

J Ma, A Cao, Z Xiao, Y Li, J Zhang, C Ye… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-image (T2I) models can be maliciously used to generate harmful content such as
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …

被引用次数：22 相关文章所有 2 个版本

SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

X Li, Y Yang, J Deng, C Yan, Y Chen, X Ji… - Proceedings of the 2024 …, 2024 - dl.acm.org

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating high-quality images from text descriptions in recent years …

被引用次数：7 相关文章

[PDF] purdue.edu

On large language models' resilience to coercive interrogation

Z Zhang, G Shen, G Tao, S Cheng… - 2024 IEEE Symposium on …, 2024 - computer.org

Abstract Large Language Models (LLMs) are increasingly employed in numerous
applications. It is hence important to ensure that their ethical standard aligns with humans' …

被引用次数：20 相关文章

[PDF] researchgate.net

[PDF][PDF] SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

X Li, Y Yang, J Deng, C Yan, Y Chen… - arXiv preprint arXiv …, 2024 - researchgate.net

Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating highquality images from text descriptions in recent years …

被引用次数：15 相关文章所有 2 个版本