Simpo: Simple preference optimization with a reference-free reward
Direct Preference Optimization (DPO) is a widely used offline preference optimization
algorithm that reparameterizes reward functions in reinforcement learning from human …
algorithm that reparameterizes reward functions in reinforcement learning from human …
Self-exploring language models: Active preference elicitation for online alignment
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Alignment of diffusion models: Fundamentals, challenges, and future
Diffusion models have emerged as the leading paradigm in generative modeling, excelling
in various applications. Despite their success, these models often misalign with human …
in various applications. Despite their success, these models often misalign with human …
Scaling laws for reward model overoptimization in direct alignment algorithms
Reinforcement Learning from Human Feedback (RLHF) has been crucial to the recent
success of Large Language Models (LLMs), however, it is often a complex and brittle …
success of Large Language Models (LLMs), however, it is often a complex and brittle …
Preference tuning with human feedback on language, speech, and vision tasks: A survey
Preference tuning is a crucial process for aligning deep generative models with human
preferences. This survey offers a thorough overview of recent advancements in preference …
preferences. This survey offers a thorough overview of recent advancements in preference …
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment
H Sun, M van der Schaar - arXiv preprint arXiv:2405.15624, 2024 - arxiv.org
Aligning Large Language Models (LLMs) is crucial for enhancing their safety and utility.
However, existing methods, primarily based on preference datasets, face challenges such …
However, existing methods, primarily based on preference datasets, face challenges such …
The importance of online data: Understanding preference fine-tuning via coverage
Learning from human preference data has emerged as the dominant paradigm for fine-
tuning large language models (LLMs). The two most common families of techniques--online …
tuning large language models (LLMs). The two most common families of techniques--online …
Optimal Design for Reward Modeling in RLHF
Reinforcement Learning from Human Feedback (RLHF) has become a popular approach to
align language models (LMs) with human preferences. This method involves collecting a …
align language models (LMs) with human preferences. This method involves collecting a …
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
As large language models (LLMs) are rapidly advancing and achieving near-human
capabilities, aligning them with human values is becoming more urgent. In scenarios where …
capabilities, aligning them with human values is becoming more urgent. In scenarios where …
Sample-Efficient Alignment for LLMs
We study methods for efficiently aligning large language models (LLMs) with human
preferences given budgeted online feedback. We first formulate the LLM alignment problem …
preferences given budgeted online feedback. We first formulate the LLM alignment problem …