Few-shot preference learning for human-in-the-loop rl
DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press
While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …
designing sufficiently informative reward functions for complex tasks has proven to be …
Inverse preference learning: Preference-based rl without a reward function
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
Contrastive prefence learning: Learning from human feedback without rl
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …
Recent advances in leveraging human guidance for sequential decision-making tasks
A longstanding goal of artificial intelligence is to create artificial agents capable of learning
to perform tasks that require sequential decision making. Importantly, while it is the artificial …
to perform tasks that require sequential decision making. Importantly, while it is the artificial …
Asking easy questions: A user-friendly approach to active reward learning
Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …
attempt to choose questions where the robot is most uncertain about the human's response; …
Promptable behaviors: Personalizing multi-objective rewards from human preferences
Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …
underexplored challenge in the field of embodied AI. In this paper we present Promptable …
Active preference-based gaussian process regression for reward learning
Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …
have a difficult time directly specifying all the desirable behaviors that a robot needs to …
When humans aren't optimal: Robots that collaborate with risk-aware humans
In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …
partners will behave. Some of today's robots model humans as if they were also robots, and …
Active preference-based Gaussian process regression for reward learning and optimization
Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …
Sequential preference ranking for efficient reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) alleviates the problem of designing a
task-specific reward function in reinforcement learning by learning it from human preference …
task-specific reward function in reinforcement learning by learning it from human preference …