LS-IQ: Implicit reward regularization for inverse reinforcement learning

J Hejna, D Sadigh - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Reward functions are difficult to design and often hard to align with human intent. Preference-
based Reinforcement Learning (RL) algorithms address these problems by learning reward …

被引用次数：40 相关文章所有 9 个版本

[PDF] arxiv.org

Dual rl: Unification and new methods for reinforcement and imitation learning

H Sikchi, Q Zheng, A Zhang, S Niekum - arXiv preprint arXiv:2302.08560, 2023 - arxiv.org

The goal of reinforcement learning (RL) is to find a policy that maximizes the expected
cumulative return. It has been shown that this objective can be represented as an …

被引用次数：23 相关文章所有 5 个版本

[PDF] neurips.cc

Coherent soft imitation learning

J Watson, S Huang, N Heess - Advances in Neural …, 2024 - proceedings.neurips.cc

Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …

被引用次数：5 相关文章所有 7 个版本

[PDF] arxiv.org

Sequencematch: Imitation learning for autoregressive sequence modelling with backtracking

C Cundy, S Ermon - arXiv preprint arXiv:2306.05426, 2023 - arxiv.org

In many domains, autoregressive models can attain high likelihood on the task of predicting
the next observation. However, this maximum-likelihood (MLE) objective does not …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Score models for offline goal-conditioned reinforcement learning

H Sikchi, R Chitnis, A Touati, A Geramifard… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve
multiple goals in an environment purely from offline datasets using sparse reward functions …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

LocoMuJoCo: A Comprehensive Imitation Learning Benchmark for Locomotion

F Al-Hafez, G Zhao, J Peters, D Tateo - arXiv preprint arXiv:2311.02496, 2023 - arxiv.org

Imitation Learning (IL) holds great promise for enabling agile locomotion in embodied
agents. However, many existing locomotion benchmarks primarily focus on simplified toy …

被引用次数：16 相关文章所有 3 个版本

[PDF] openreview.net

Fast imitation via behavior foundation models

M Pirotta, A Tirinzoni, A Touati, A Lazaric… - … Foundation Models for …, 2023 - openreview.net

Imitation learning (IL) aims at producing agents that can imitate any behavior given a few
expert demonstrations. Yet existing approaches require many demonstrations and/or …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Imitation learning from observation with automatic discount scheduling

Y Liu, W Dong, Y Hu, C Wen, ZH Yin, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Humans often acquire new skills through observation and imitation. For robotic agents,
learning from the plethora of unlabeled video demonstration data available on the Internet …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Learning robot manipulation from cross-morphology demonstration

G Salhotra, I Liu, C Arthur, G Sukhatme - arXiv preprint arXiv:2304.03833, 2023 - arxiv.org

Some Learning from Demonstrations (LfD) methods handle small mismatches in the action
spaces of the teacher and student. Here we address the case where the teacher's …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

OLLIE: Imitation Learning from Offline Pretraining to Online Finetuning

S Yue, X Hua, J Ren, S Lin, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we study offline-to-online Imitation Learning (IL) that pretrains an imitation
policy from static demonstration data, followed by fast finetuning with minimal environmental …