Inverse preference learning: Preference-based rl without a reward function
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
based Reinforcement Learning (RL) algorithms address these problems by learning reward …
Dual rl: Unification and new methods for reinforcement and imitation learning
The goal of reinforcement learning (RL) is to find a policy that maximizes the expected
cumulative return. It has been shown that this objective can be represented as an …
cumulative return. It has been shown that this objective can be represented as an …
Coherent soft imitation learning
Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …
Sequencematch: Imitation learning for autoregressive sequence modelling with backtracking
In many domains, autoregressive models can attain high likelihood on the task of predicting
the next observation. However, this maximum-likelihood (MLE) objective does not …
the next observation. However, this maximum-likelihood (MLE) objective does not …
Score models for offline goal-conditioned reinforcement learning
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve
multiple goals in an environment purely from offline datasets using sparse reward functions …
multiple goals in an environment purely from offline datasets using sparse reward functions …
LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion
Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied
agents. However, many existing locomotion benchmarks primarily focus on simplified toy …
agents. However, many existing locomotion benchmarks primarily focus on simplified toy …
Fast imitation via behavior foundation models
Imitation learning (IL) aims at producing agents that can imitate any behavior given a few
expert demonstrations. Yet existing approaches require many demonstrations and/or …
expert demonstrations. Yet existing approaches require many demonstrations and/or …
Imitation learning from observation with automatic discount scheduling
Humans often acquire new skills through observation and imitation. For robotic agents,
learning from the plethora of unlabeled video demonstration data available on the Internet …
learning from the plethora of unlabeled video demonstration data available on the Internet …
Learning robot manipulation from cross-morphology demonstration
Some Learning from Demonstrations (LfD) methods handle small mismatches in the action
spaces of the teacher and student. Here we address the case where the teacher's …
spaces of the teacher and student. Here we address the case where the teacher's …
OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning
In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation
policy from static demonstration data, followed by fast finetuning with minimal environmental …
policy from static demonstration data, followed by fast finetuning with minimal environmental …