Learning latent representations to influence multi-agent interaction

A Xie, D Losey, R Tolsma, C Finn… - Conference on robot …, 2021 - proceedings.mlr.press
Seamlessly interacting with humans or robots is hard because these agents are non-
stationary. They update their policy in response to the ego agent's behavior, and the ego …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Asking easy questions: A user-friendly approach to active reward learning

E Bıyık, M Palan, NC Landolfi, DP Losey… - arXiv preprint arXiv …, 2019 - arxiv.org
Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …

Active preference-based gaussian process regression for reward learning

E Bıyık, N Huynh, MJ Kochenderfer… - arXiv preprint arXiv …, 2020 - arxiv.org
Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …

When humans aren't optimal: Robots that collaborate with risk-aware humans

M Kwon, E Biyik, A Talati, K Bhasin, DP Losey… - Proceedings of the …, 2020 - dl.acm.org
In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …

Active preference-based Gaussian process regression for reward learning and optimization

E Bıyık, N Huynh, MJ Kochenderfer… - … Journal of Robotics …, 2024 - journals.sagepub.com
Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …

Learning multimodal rewards from rankings

V Myers, E Biyik, N Anari… - Conference on robot …, 2022 - proceedings.mlr.press
Learning from human feedback has shown to be a useful approach in acquiring robot
reward functions. However, expert feedback is often assumed to be drawn from an …

Reinforcement learning based control of imitative policies for near-accident driving

Z Cao, E Bıyık, WZ Wang, A Raventos, A Gaidon… - arXiv preprint arXiv …, 2020 - arxiv.org
Autonomous driving has achieved significant progress in recent years, but autonomous cars
are still unable to tackle high-risk situations where a potential accident is likely. In such near …

Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences

E Bıyık, DP Losey, M Palan… - … Journal of Robotics …, 2022 - journals.sagepub.com
Reward functions are a common way to specify the objective of a robot. As designing reward
functions can be extremely challenging, a more promising approach is to directly learn …

Llf-bench: Benchmark for interactive learning from language feedback

CA Cheng, A Kolobov, D Misra, A Nie… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback
Benchmark; pronounced as" elf-bench"), to evaluate the ability of AI agents to interactively …