Composable interventions for language models
Test-time interventions for language models can enhance factual accuracy, mitigate harmful
outputs, and improve model efficiency without costly retraining. But despite a flood of new …
outputs, and improve model efficiency without costly retraining. But despite a flood of new …
Position: LLM Unlearning Benchmarks are Weak Measures of Progress
Unlearning methods have the potential to improve the privacy and safety of large language
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning …
A Closer Look at Machine Unlearning for Large Language Models
Large language models (LLMs) may memorize sensitive or copyrighted content, raising
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …
privacy and legal concerns. Due to the high cost of retraining from scratch, researchers …