Privately aligning language models with reinforcement learning

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Privately aligning language models with reinforcement learning

在引用文章中搜索

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：47 相关文章所有 4 个版本

[PDF] arxiv.org

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

M Miranda, ES Ruzzetti, A Santilli, FM Zanzotto… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) represent a significant advancement in artificial
intelligence, finding applications across various domains. However, their reliance on …

[PDF] arxiv.org

Privacy-Preserving Instructions for Aligning Large Language Models

D Yu, P Kairouz, S Oh, Z Xu - arXiv preprint arXiv:2402.13659, 2024 - arxiv.org

Service providers of large language model (LLM) applications collect user instructions in the
wild and use them in further aligning LLMs with users' intentions. These instructions, which …

被引用次数：3 相关文章所有 6 个版本