Active learning of reward dynamics from hierarchical queries

A Xie, D Losey, R Tolsma, C Finn… - Conference on robot …, 2021 - proceedings.mlr.press

Seamlessly interacting with humans or robots is hard because these agents are non-
stationary. They update their policy in response to the ego agent's behavior, and the ego …

被引用次数：133 相关文章所有 7 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：75 相关文章所有 4 个版本

[PDF] arxiv.org

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org

Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

被引用次数：140 相关文章所有 8 个版本

[PDF] arxiv.org

Active preference-based gaussian process regression for reward learning

E Bıyık, N Huynh, MJ Kochenderfer… - arXiv preprint arXiv …, 2020 - arxiv.org

Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …

被引用次数：104 相关文章所有 10 个版本

[PDF] arxiv.org

When humans aren't optimal: Robots that collaborate with risk-aware humans

M Kwon, E Biyik, A Talati, K Bhasin, DP Losey… - Proceedings of the …, 2020 - dl.acm.org

In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …

被引用次数：105 相关文章所有 11 个版本

[PDF] researchgate.net

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com

Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

Learning multimodal rewards from rankings

V Myers, E Biyik, N Anari… - Conference on robot …, 2022 - proceedings.mlr.press

Learning from human feedback has shown to be a useful approach in acquiring robot
reward functions. However, expert feedback is often assumed to be drawn from an …

被引用次数：50 相关文章所有 9 个版本

[PDF] arxiv.org

Reinforcement learning based control of imitative policies for near-accident driving

Z Cao, E Bıyık, WZ Wang, A Raventos, A Gaidon… - arXiv preprint arXiv …, 2020 - arxiv.org

Autonomous driving has achieved significant progress in recent years, but autonomous cars
are still unable to tackle high-risk situations where a potential accident is likely. In such near …

被引用次数：68 相关文章所有 13 个版本

[PDF] sagepub.com

Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

E Bıyık, DP Losey, M Palan… - … Journal of Robotics …, 2022 - journals.sagepub.com

Reward functions are a common way to specify the objective of a robot. As designing reward
functions can be extremely challenging, a more promising approach is to directly learn …

被引用次数：117 相关文章所有 12 个版本

[PDF] arxiv.org

Llf-bench: Benchmark for interactive learning from language feedback

CA Cheng, A Kolobov, D Misra, A Nie… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce a new benchmark, LLF-Bench (Learning from Language Feedback
Benchmark; pronounced as" elf-bench"), to evaluate the ability of AI agents to interactively …

被引用次数：8 相关文章所有 3 个版本