The green choice: Learning and influencing human decisions on shared roads

DJ Hejna III, D Sadigh - Conference on Robot Learning, 2023 - proceedings.mlr.press

While reinforcement learning (RL) has become a more popular approach for robotics,
designing sufficiently informative reward functions for complex tasks has proven to be …

被引用次数：79 相关文章所有 6 个版本

[PDF] neurips.cc

Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：32 相关文章所有 9 个版本

[PDF] arxiv.org

Contrastive prefence learning: Learning from human feedback without rl

J Hejna, R Rafailov, H Sikchi, C Finn, S Niekum… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular
paradigm for aligning models with human intent. Typically RLHF algorithms operate in two …

被引用次数：44 相关文章所有 5 个版本

[PDF] arxiv.org

Recent advances in leveraging human guidance for sequential decision-making tasks

R Zhang, F Torabi, G Warnell, P Stone - Autonomous Agents and Multi …, 2021 - Springer

A longstanding goal of artificial intelligence is to create artificial agents capable of learning
to perform tasks that require sequential decision making. Importantly, while it is the artificial …

被引用次数：33 相关文章所有 6 个版本

[PDF] arxiv.org

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org

Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

被引用次数：140 相关文章所有 8 个版本

[PDF] thecvf.com

Promptable behaviors: Personalizing multi-objective rewards from human preferences

M Hwang, L Weihs, C Park, K Lee… - Proceedings of the …, 2024 - openaccess.thecvf.com

Customizing robotic behaviors to be aligned with diverse human preferences is an
underexplored challenge in the field of embodied AI. In this paper we present Promptable …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Active preference-based gaussian process regression for reward learning

E Bıyık, N Huynh, MJ Kochenderfer… - arXiv preprint arXiv …, 2020 - arxiv.org

Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …

被引用次数：104 相关文章所有 10 个版本

[PDF] arxiv.org

When humans aren't optimal: Robots that collaborate with risk-aware humans

M Kwon, E Biyik, A Talati, K Bhasin, DP Losey… - Proceedings of the …, 2020 - dl.acm.org

In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …

被引用次数：105 相关文章所有 11 个版本

[PDF] researchgate.net

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com

Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

被引用次数：10 相关文章所有 6 个版本

[PDF] neurips.cc

Sequential preference ranking for efficient reinforcement learning from human feedback

M Hwang, G Lee, H Kee, CW Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Reinforcement learning from human feedback (RLHF) alleviates the problem of designing a
task-specific reward function in reinforcement learning by learning it from human preference …

被引用次数：7 相关文章所有 5 个版本