Few-shot preference learning for human-in-the-loop rl

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press
While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

Recent advances in leveraging human guidance for sequential decision-making tasks

R Zhang, F Torabi, G Warnell, P Stone - Autonomous Agents and Multi …, 2021 - Springer
A longstanding goal of artificial intelligence is to create artificial agents capable of learning
to perform tasks that require sequential decision making. Importantly, while it is the artificial …

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org
Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

Active preference-based gaussian process regression for reward learning

E Bıyık, N Huynh, MJ Kochenderfer… - arXiv preprint arXiv …, 2020 - arxiv.org
Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …

When humans aren't optimal: Robots that collaborate with risk-aware humans

M Kwon, E Biyik, A Talati, K Bhasin, DP Losey… - Proceedings of the …, 2020 - dl.acm.org
In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com
Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

Sequential preference ranking for efficient reinforcement learning from human feedback

M Hwang, G Lee, H Kee, CW Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning from human feedback (RLHF) alleviates the problem of designing a
task-specific reward function in reinforcement learning by learning it from human preference …