Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
The wmdp benchmark: Measuring and reducing malicious use with unlearning
The White House Executive Order on Artificial Intelligence highlights the risks of large
language models (LLMs) empowering malicious actors in developing biological, cyber, and …
language models (LLMs) empowering malicious actors in developing biological, cyber, and …
Detecting pretraining data from large language models
Although large language models (LLMs) are widely deployed, the data used to train them is
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …
Muse: Machine unlearning six-way evaluation for language models
Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …
and copyrighted content. Data owners may request the removal of their data from a trained …
Copyright protection in generative ai: A technical perspective
Generative AI has witnessed rapid advancement in recent years, expanding their
capabilities to create synthesized content such as text, images, audio, and code. The high …
capabilities to create synthesized content such as text, images, audio, and code. The high …
Perils and opportunities in using large language models in psychological research
The emergence of large language models (LLMs) has sparked considerable interest in their
potential application in psychological research, mainly as a model of the human psyche or …
potential application in psychological research, mainly as a model of the human psyche or …
Knowledgeable preference alignment for llms in domain-specific question answering
Deploying large language models (LLMs) to real scenarios for domain-specific question
answering (QA) is a key thrust for LLM applications, which poses numerous challenges …
answering (QA) is a key thrust for LLM applications, which poses numerous challenges …
Knowledge unlearning for llms: Tasks, methods, and challenges
N Si, H Zhang, H Chang, W Zhang, D Qu… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, large language models (LLMs) have spurred a new research paradigm in
natural language processing. Despite their excellent capability in knowledge-based …
natural language processing. Despite their excellent capability in knowledge-based …
Negative preference optimization: From catastrophic collapse to effective unlearning
Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …
Towards safer large language models through machine unlearning
The rapid advancement of Large Language Models (LLMs) has demonstrated their vast
potential across various domains, attributed to their extensive pretraining knowledge and …
potential across various domains, attributed to their extensive pretraining knowledge and …