Trustllm: Trustworthiness in large language models
Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …
attention for their excellent natural language processing capabilities. Nonetheless, these …
[HTML][HTML] Position: TrustLLM: Trustworthiness in large language models
Large language models (LLMs) have gained considerable attention for their excellent
natural language processing capabilities. Nonetheless, these LLMs present many …
natural language processing capabilities. Nonetheless, these LLMs present many …
Red-Teaming for generative AI: Silver bullet or security theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Unique security and privacy threats of large language model: A comprehensive survey
With the rapid development of artificial intelligence, large language models (LLMs) have
made remarkable advancements in natural language processing. These models are trained …
made remarkable advancements in natural language processing. These models are trained …
Latent guard: a safety framework for text-to-image generation
With the ability to generate high-quality images, text-to-image (T2I) models can be exploited
for creating inappropriate content. To prevent misuse, existing safety measures are either …
for creating inappropriate content. To prevent misuse, existing safety measures are either …
Attacks and defenses for generative diffusion models: A comprehensive survey
Diffusion models (DMs) have achieved state-of-the-art performance on various generative
tasks such as image synthesis, text-to-image, and text-guided image-to-image generation …
tasks such as image synthesis, text-to-image, and text-guided image-to-image generation …
Jailbreaking prompt attack: A controllable adversarial attack against diffusion models
Text-to-image (T2I) models can be maliciously used to generate harmful content such as
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …
sexually explicit, unfaithful, and misleading or Not-Safe-for-Work (NSFW) images. Previous …
SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating high-quality images from text descriptions in recent years …
performance in generating high-quality images from text descriptions in recent years …
On large language models' resilience to coercive interrogation
Abstract Large Language Models (LLMs) are increasingly employed in numerous
applications. It is hence important to ensure that their ethical standard aligns with humans' …
applications. It is hence important to ensure that their ethical standard aligns with humans' …
[PDF][PDF] SafeGen: Mitigating Sexually Explicit Content Generation in Text-to-Image Models
Text-to-image (T2I) models, such as Stable Diffusion, have exhibited remarkable
performance in generating highquality images from text descriptions in recent years …
performance in generating highquality images from text descriptions in recent years …