Adversarial soft advantage fitting: Imitation learning without policy optimization

D Garg, S Chakraborty, C Cundy… - Advances in Neural …, 2021 - proceedings.neurips.cc

In many sequential decision-making problems (eg, robotics control, game playing,
sequential prediction), human or expert data is available containing useful information about …

被引用次数：171 相关文章所有 9 个版本

[PDF] neurips.cc

Maximum-likelihood inverse reinforcement learning with finite-time guarantees

S Zeng, C Li, A Garcia, M Hong - Advances in Neural …, 2022 - proceedings.neurips.cc

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …

被引用次数：34 相关文章所有 10 个版本

[PDF] arxiv.org

Feedback in imitation learning: The three regimes of covariate shift

J Spencer, S Choudhury, A Venkatraman… - arXiv preprint arXiv …, 2021 - arxiv.org

Imitation learning practitioners have often noted that conditioning policies on previous
actions leads to a dramatic divergence between" held out" error and performance of the …

被引用次数：68 相关文章所有 3 个版本

[PDF] mlr.press

Inverse decision modeling: Learning interpretable representations of behavior

D Jarrett, A Hüyük… - … Conference on Machine …, 2021 - proceedings.mlr.press

Decision analysis deals with modeling and enhancing decision processes. A principal
challenge in improving behavior is in obtaining a transparent* description* of existing …

被引用次数：32 相关文章所有 7 个版本

[PDF] neurips.cc

Coherent soft imitation learning

J Watson, S Huang, N Heess - Advances in Neural …, 2024 - proceedings.neurips.cc

Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …

被引用次数：5 相关文章所有 7 个版本

[PDF] neurips.cc

Proximal point imitation learning

L Viano, A Kamoutsi, G Neu… - Advances in Neural …, 2022 - proceedings.neurips.cc

This work develops new algorithms with rigorous efficiency guarantees for infinite horizon
imitation learning (IL) with linear function approximation without restrictive coherence …

被引用次数：11 相关文章所有 10 个版本

[PDF] arxiv.org

Sequencematch: Imitation learning for autoregressive sequence modelling with backtracking

C Cundy, S Ermon - arXiv preprint arXiv:2306.05426, 2023 - arxiv.org

In many domains, autoregressive models can attain high likelihood on the task of predicting
the next observation. However, this maximum-likelihood (MLE) objective does not …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

A model-based solution to the offline multi-agent reinforcement learning coordination problem

P Barde, J Foerster, D Nowrouzezahrai… - arXiv preprint arXiv …, 2023 - arxiv.org

Training multiple agents to coordinate is an essential problem with applications in robotics,
game theory, economics, and social sciences. However, most existing Multi-Agent …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Diffusion imitation from observation

BR Huang, CK Yang, CM Lai, DJ Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …

被引用次数：1 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] All by myself: learning individualized competitive behavior with a contrastive reinforcement learning optimization

P Barros, A Sciutti - Neural Networks, 2022 - Elsevier

In a competitive game scenario, a set of agents have to learn decisions that maximize their
goals and minimize their adversaries' goals at the same time. Besides dealing with the …

被引用次数：7 相关文章所有 7 个版本