Language model behavior: A comprehensive survey
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
generated text is often surprising even to NLP researchers. In this survey, we discuss over …
Visual adversarial examples jailbreak aligned large language models
Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …
Recently, there has been a surge of interest in integrating vision into Large Language …
Why so toxic? measuring and triggering toxic behavior in open-domain chatbots
Chatbots are used in many applications, eg, automated agents, smart home assistants,
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …
interactive characters in online games, etc. Therefore, it is crucial to ensure they do not …
Visual adversarial examples jailbreak large language models
Recently, there has been a surge of interest in introducing vision into Large Language
Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as …
Models (LLMs). The proliferation of large Visual Language Models (VLMs), such as …
Flirt: Feedback loop in-context red teaming
Warning: this paper contains content that may be inappropriate or offensive. As generative
models become available for public use in various applications, testing and analyzing …
models become available for public use in various applications, testing and analyzing …
Robustness of models addressing Information Disorder: A comprehensive review and benchmarking study
Abstract Machine learning and deep learning models are increasingly susceptible to
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …
Beyond detection: a defend-and-summarize strategy for robust and interpretable rumor analysis on social media
As the impact of social media gradually escalates, people are more likely to be exposed to
indistinguishable fake news. Therefore, numerous studies have attempted to detect rumors …
indistinguishable fake news. Therefore, numerous studies have attempted to detect rumors …
Run like a girl! sports-related gender bias in language and vision
S Harrison, E Gualdoni, G Boleda - arXiv preprint arXiv:2305.14468, 2023 - arxiv.org
Gender bias in Language and Vision datasets and models has the potential to perpetuate
harmful stereotypes and discrimination. We analyze gender bias in two Language and …
harmful stereotypes and discrimination. We analyze gender bias in two Language and …
Privacy preserving large language models: Chatgpt case study based vision and framework
The generative Artificial Intelligence (AI) tools based on Large Language Models (LLMs) use
billions of parameters to extensively analyse large datasets and extract critical private …
billions of parameters to extensively analyse large datasets and extract critical private …
Gradient-based language model red teaming
Red teaming is a common strategy for identifying weaknesses in generative language
models (LMs), where adversarial prompts are produced that trigger an LM to generate …
models (LMs), where adversarial prompts are produced that trigger an LM to generate …