Who's Harry Potter? Approximate Unlearning in LLMs

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：164 相关文章所有 3 个版本

[PDF] arxiv.org

The wmdp benchmark: Measuring and reducing malicious use with unlearning

N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti… - arXiv preprint arXiv …, 2024 - arxiv.org

The White House Executive Order on Artificial Intelligence highlights the risks of large
language models (LLMs) empowering malicious actors in developing biological, cyber, and …

被引用次数：57 相关文章所有 3 个版本

[PDF] arxiv.org

Detecting pretraining data from large language models

W Shi, A Ajith, M Xia, Y Huang, D Liu, T Blevins… - arXiv preprint arXiv …, 2023 - arxiv.org

Although large language models (LLMs) are widely deployed, the data used to train them is
rarely disclosed. Given the incredible scale of this data, up to trillions of tokens, it is all but …

被引用次数：158 相关文章所有 4 个版本

[PDF] arxiv.org

Muse: Machine unlearning six-way evaluation for language models

W Shi, J Lee, Y Huang, S Malladi, J Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

Language models (LMs) are trained on vast amounts of text data, which may include private
and copyrighted content. Data owners may request the removal of their data from a trained …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Copyright protection in generative ai: A technical perspective

J Ren, H Xu, P He, Y Cui, S Zeng, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Generative AI has witnessed rapid advancement in recent years, expanding their
capabilities to create synthesized content such as text, images, audio, and code. The high …

被引用次数：22 相关文章所有 2 个版本

[PDF] oup.com Full View

Perils and opportunities in using large language models in psychological research

S Abdurahman, M Atari, F Karimi-Malekabadi… - PNAS …, 2024 - academic.oup.com

The emergence of large language models (LLMs) has sparked considerable interest in their
potential application in psychological research, mainly as a model of the human psyche or …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

Knowledgeable preference alignment for llms in domain-specific question answering

Y Zhang, Z Chen, Y Fang, Y Lu, F Li, W Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Deploying large language models (LLMs) to real scenarios for domain-specific question
answering (QA) is a key thrust for LLM applications, which poses numerous challenges …

被引用次数：17 相关文章所有 3 个版本

[PDF] arxiv.org

Knowledge unlearning for llms: Tasks, methods, and challenges

N Si, H Zhang, H Chang, W Zhang, D Qu… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, large language models (LLMs) have spurred a new research paradigm in
natural language processing. Despite their excellent capability in knowledge-based …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Negative preference optimization: From catastrophic collapse to effective unlearning

R Zhang, L Lin, Y Bai, S Mei - arXiv preprint arXiv:2404.05868, 2024 - arxiv.org

Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data
during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from …

被引用次数：49 相关文章所有 2 个版本

[PDF] arxiv.org

Towards safer large language models through machine unlearning

Z Liu, G Dou, Z Tan, Y Tian, M Jiang - arXiv preprint arXiv:2402.10058, 2024 - arxiv.org

The rapid advancement of Large Language Models (LLMs) has demonstrated their vast
potential across various domains, attributed to their extensive pretraining knowledge and …

被引用次数：39 相关文章所有 2 个版本