A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

M Miranda, ES Ruzzetti, A Santilli, FM Zanzotto… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) represent a significant advancement in artificial
intelligence, finding applications across various domains. However, their reliance on …

Privacy-Preserving Instructions for Aligning Large Language Models

D Yu, P Kairouz, S Oh, Z Xu - arXiv preprint arXiv:2402.13659, 2024 - arxiv.org
Service providers of large language model (LLM) applications collect user instructions in the
wild and use them in further aligning LLMs with users' intentions. These instructions, which …