Combating misinformation in the age of llms: Opportunities and challenges
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …
and public trust. The emergence of large language models (LLMs) has great potential to …
[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4
KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …
Baseline defenses for adversarial attacks against aligned language models
As Large Language Models quickly become ubiquitous, their security vulnerabilities are
critical to understand. Recent work shows that text optimizers can produce jailbreaking …
critical to understand. Recent work shows that text optimizers can produce jailbreaking …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Mart: Improving llm safety with multi-round automatic red-teaming
Red-teaming is a common practice for mitigating unsafe behaviors in Large Language
Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and …
Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and …
[HTML][HTML] Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …
Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?
Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have
recently demonstrated exceptional capabilities for generating high-quality content. However …
recently demonstrated exceptional capabilities for generating high-quality content. However …
Red-Teaming for Generative AI: Silver Bullet or Security Theater?
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …
Confidence matters: Revisiting intrinsic self-correction capabilities of large language models
The recent success of Large Language Models (LLMs) has catalyzed an increasing interest
in their self-correction capabilities. This paper presents a comprehensive investigation into …
in their self-correction capabilities. This paper presents a comprehensive investigation into …
Exploring safety generalization challenges of large language models via code
The rapid advancement of Large Language Models (LLMs) has brought about remarkable
capabilities in natural language processing but also raised concerns about their potential …
capabilities in natural language processing but also raised concerns about their potential …