The language barrier: Dissecting safety challenges of llms in multilingual contexts
As the influence of large language models (LLMs) spans across global communities, their
safety challenges in multilingual settings become paramount for alignment research. This …
safety challenges in multilingual settings become paramount for alignment research. This …
The alignment ceiling: Objective mismatch in reinforcement learning from human feedback
N Lambert, R Calandra - arXiv preprint arXiv:2311.00168, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique
to make large language models (LLMs) more capable in complex settings. RLHF proceeds …
to make large language models (LLMs) more capable in complex settings. RLHF proceeds …
Entangled preferences: The history and risks of reinforcement learning and human feedback
Reinforcement learning from human feedback (RLHF) has emerged as a powerful technique
to make large language models (LLMs) easier to use and more effective. A core piece of the …
to make large language models (LLMs) easier to use and more effective. A core piece of the …
Transforming and Combining Rewards for Aligning Large Language Models
A common approach for aligning language models to human preferences is to first learn a
reward model from preference data, and then use this reward model to update the language …
reward model from preference data, and then use this reward model to update the language …
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Reinforcement Learning from Human Feedback (RLHF) involves training policy models
(PMs) and reward models (RMs) to align language models with human preferences. Instead …
(PMs) and reward models (RMs) to align language models with human preferences. Instead …
[PDF][PDF] Research Agenda for Sociotechnical Approaches to AI Safety
As the capabilities of AI systems continue to advance, it is increasingly important that we
guide the development of these powerful technologies, ensuring they are used for the …
guide the development of these powerful technologies, ensuring they are used for the …