Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu… - International …, 2024 - proceedings.mlr.press
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Unique security and privacy threats of large language model: A comprehensive survey

S Wang, T Zhu, B Liu, M Ding, X Guo, D Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable advancements in natural language processing. These models are trained …

Latent guard: a safety framework for text-to-image generation

R Liu, A Khakzar, J Gu, Q Chen, P Torr… - European Conference on …, 2025 - Springer
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited
for creating inappropriate content. To prevent misuse, existing safety measures are either …

Attacks and defenses for generative diffusion models: A comprehensive survey

VT Truong, LB Dang, LB Le - arXiv preprint arXiv:2408.03400, 2024 - arxiv.org
Diffusion models (DMs) have achieved state-of-the-art performance on various generative
tasks such as image synthesis, text-to-image, and text-guided image-to-image generation …

Jailbreaking prompt attack: A controllable adversarial attack against diffusion models

J Ma, A Cao, Z Xiao, Y Li, J Zhang, C Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-image (T2I) models can be maliciously used to generate harmful content such as
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …

SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

X Li, Y Yang, J Deng, C Yan, Y Chen, X Ji… - Proceedings of the 2024 …, 2024 - dl.acm.org
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating high-quality images from text descriptions in recent years …

On large language models' resilience to coercive interrogation

Z Zhang, G Shen, G Tao, S Cheng… - 2024 IEEE Symposium on …, 2024 - computer.org
Abstract Large Language Models (LLMs) are increasingly employed in numerous
applications. It is hence important to ensure that their ethical standard aligns with humans' …

[PDF][PDF] SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models

X Li, Y Yang, J Deng, C Yan, Y Chen… - arXiv preprint arXiv …, 2024 - researchgate.net
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating highquality images from text descriptions in recent years …