Inverse preference learning: Preference-based rl without a reward function

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

Dual rl: Unification and new methods for reinforcement and imitation learning

H Sikchi, Q Zheng, A Zhang, S Niekum - arXiv preprint arXiv:2302.08560, 2023 - arxiv.org
The goal of reinforcement learning (RL) is to find a policy that maximizes the expected
cumulative return. It has been shown that this objective can be represented as an …

Coherent soft imitation learning

J Watson, S Huang, N Heess - Advances in Neural …, 2024 - proceedings.neurips.cc
Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …

Sequencematch: Imitation learning for autoregressive sequence modelling with backtracking

C Cundy, S Ermon - arXiv preprint arXiv:2306.05426, 2023 - arxiv.org
In many domains, autoregressive models can attain high likelihood on the task of predicting
the next observation. However, this maximum-likelihood (MLE) objective does not …

Score models for offline goal-conditioned reinforcement learning

H Sikchi, R Chitnis, A Touati, A Geramifard… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve
multiple goals in an environment purely from offline datasets using sparse reward functions …

LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

F Al-Hafez, G Zhao, J Peters, D Tateo - arXiv preprint arXiv:2311.02496, 2023 - arxiv.org
Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied
agents. However, many existing locomotion benchmarks primarily focus on simplified toy …

Fast imitation via behavior foundation models

M Pirotta, A Tirinzoni, A Touati, A Lazaric… - … Foundation Models for …, 2023 - openreview.net
Imitation learning (IL) aims at producing agents that can imitate any behavior given a few
expert demonstrations. Yet existing approaches require many demonstrations and/or …

Imitation learning from observation with automatic discount scheduling

Y Liu, W Dong, Y Hu, C Wen, ZH Yin, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Humans often acquire new skills through observation and imitation. For robotic agents,
learning from the plethora of unlabeled video demonstration data available on the Internet …

Learning robot manipulation from cross-morphology demonstration

G Salhotra, I Liu, C Arthur, G Sukhatme - arXiv preprint arXiv:2304.03833, 2023 - arxiv.org
Some Learning from Demonstrations (LfD) methods handle small mismatches in the action
spaces of the teacher and student. Here we address the case where the teacher's …

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

S Yue, X Hua, J Ren, S Lin, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation
policy from static demonstration data, followed by fast finetuning with minimal environmental …