Learning latent representations to influence multi-agent interaction
Seamlessly interacting with humans or robots is hard because these agents are non-
stationary. They update their policy in response to the ego agent's behavior, and the ego …
stationary. They update their policy in response to the ego agent's behavior, and the ego …
A survey of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …
(RL) that learns from human feedback instead of relying on an engineered reward function …
Asking easy questions: A user-friendly approach to active reward learning
Robots can learn the right reward function by querying a human expert. Existing approaches
attempt to choose questions where the robot is most uncertain about the human's response; …
attempt to choose questions where the robot is most uncertain about the human's response; …
Active preference-based gaussian process regression for reward learning
Designing reward functions is a challenging problem in AI and robotics. Humans usually
have a difficult time directly specifying all the desirable behaviors that a robot needs to …
have a difficult time directly specifying all the desirable behaviors that a robot needs to …
When humans aren't optimal: Robots that collaborate with risk-aware humans
In order to collaborate safely and efficiently, robots need to anticipate how their human
partners will behave. Some of today's robots model humans as if they were also robots, and …
partners will behave. Some of today's robots model humans as if they were also robots, and …
Active preference-based Gaussian process regression for reward learning and optimization
Designing reward functions is a difficult task in AI and robotics. The complex task of directly
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …
specifying all the desirable behaviors a robot needs to optimize often proves challenging for …
Learning multimodal rewards from rankings
Learning from human feedback has shown to be a useful approach in acquiring robot
reward functions. However, expert feedback is often assumed to be drawn from an …
reward functions. However, expert feedback is often assumed to be drawn from an …
Reinforcement learning based control of imitative policies for near-accident driving
Autonomous driving has achieved significant progress in recent years, but autonomous cars
are still unable to tackle high-risk situations where a potential accident is likely. In such near …
are still unable to tackle high-risk situations where a potential accident is likely. In such near …
Learning reward functions from diverse sources of human feedback: Optimally integrating demonstrations and preferences
Reward functions are a common way to specify the objective of a robot. As designing reward
functions can be extremely challenging, a more promising approach is to directly learn …
functions can be extremely challenging, a more promising approach is to directly learn …
Llf-bench: Benchmark for interactive learning from language feedback
We introduce a new benchmark, LLF-Bench (Learning from Language Feedback
Benchmark; pronounced as" elf-bench"), to evaluate the ability of AI agents to interactively …
Benchmark; pronounced as" elf-bench"), to evaluate the ability of AI agents to interactively …