Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2023 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

[HTML][HTML] A survey of GPT-3 family large language models including ChatGPT and GPT-4

KS Kalyan - Natural Language Processing Journal, 2023 - Elsevier
Large language models (LLMs) are a special class of pretrained language models (PLMs)
obtained by scaling model size, pretraining corpus and computation. LLMs, because of their …

Baseline defenses for adversarial attacks against aligned language models

N Jain, A Schwarzschild, Y Wen, G Somepalli… - arXiv preprint arXiv …, 2023 - arxiv.org
As Large Language Models quickly become ubiquitous, their security vulnerabilities are
critical to understand. Recent work shows that text optimizers can produce jailbreaking …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Mart: Improving llm safety with multi-round automatic red-teaming

S Ge, C Zhou, R Hou, M Khabsa, YC Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Red-teaming is a common practice for mitigating unsafe behaviors in Large Language
Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and …

[HTML][HTML] Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

Ring-A-Bell! How Reliable are Concept Removal Methods for Diffusion Models?

YL Tsai, CY Hsu, C Xie, CH Lin, JY Chen, B Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models for text-to-image (T2I) synthesis, such as Stable Diffusion (SD), have
recently demonstrated exceptional capabilities for generating high-quality content. However …

Red-Teaming for Generative AI: Silver Bullet or Security Theater?

M Feffer, A Sinha, ZC Lipton, H Heidari - arXiv preprint arXiv:2401.15897, 2024 - arxiv.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Confidence matters: Revisiting intrinsic self-correction capabilities of large language models

L Li, G Chen, Y Su, Z Chen, Y Zhang, E Xing… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent success of Large Language Models (LLMs) has catalyzed an increasing interest
in their self-correction capabilities. This paper presents a comprehensive investigation into …

Exploring safety generalization challenges of large language models via code

Q Ren, C Gao, J Shao, J Yan, X Tan, W Lam… - arXiv preprint arXiv …, 2024 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has brought about remarkable
capabilities in natural language processing but also raised concerns about their potential …