Can Editing LLMs Inject Harm?

C Chen, B Huang, Z Li, Z Chen, S Lai, X Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge editing has been increasingly adopted to correct the false or outdated
knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored …

Composable interventions for language models

A Kolbeinsson, K O'Brien, T Huang, S Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Test-time interventions for language models can enhance factual accuracy, mitigate harmful
outputs, and improve model efficiency without costly retraining. But despite a flood of new …

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

P Hase, T Hofweber, X Zhou, E Stengel-Eskin… - arXiv preprint arXiv …, 2024 - arxiv.org
The model editing problem concerns how language models should learn new facts about
the world over time. While empirical research on model editing has drawn widespread …

Editing conceptual knowledge for large language models

X Wang, S Mao, N Zhang, S Deng, Y Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, there has been a growing interest in knowledge editing for Large Language
Models (LLMs). Current approaches and evaluations merely explore the instance-level …

Can Knowledge Editing Really Correct Hallucinations?

B Huang, C Chen, X Xu, A Payani, K Shu - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) suffer from hallucinations, referring to the non-factual
information in generated content, despite their superior capacities across tasks. Meanwhile …

Detoxifying Large Language Models via Knowledge Editing

M Wang, N Zhang, Z Xu, Z Xi, S Deng, Y Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper investigates using knowledge editing techniques to detoxify Large Language
Models (LLMs). We construct a benchmark, SafeEdit, which covers nine unsafe categories …

STACKFEED: Structured Textual Actor-Critic Knowledge Base Editing with FeedBack

N Gupta, S Kirtania, P Gupta, K Kariya… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) often generate incorrect or outdated information, especially
in low-resource settings or when dealing with private data. To address this, Retrieval …

Language Modeling with Editable External Knowledge

BZ Li, E Liu, A Ross, A Zeitoun, G Neubig… - arXiv preprint arXiv …, 2024 - arxiv.org
When the world changes, so does the text that humans write about it. How do we build
language models that can be easily updated to reflect these changes? One popular …

KnowTuning: Knowledge-aware Fine-tuning for Large Language Models

Y Lyu, L Yan, S Wang, H Shi, D Yin, P Ren… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite their success at many natural language processing (NLP) tasks, large language
models (LLMs) still struggle to effectively leverage knowledge for knowledge-intensive tasks …

Knowledge Localization: Mission Not Accomplished? Enter Query Localization!

Y Chen, P Cao, Y Chen, K Liu, J Zhao - arXiv preprint arXiv:2405.14117, 2024 - arxiv.org
Large language models (LLMs) store extensive factual knowledge, but the mechanisms
behind how they store and express this knowledge remain unclear. The Knowledge Neuron …