Meta-inverse reinforcement learning with probabilistic context variables

J Beck, R Vuorio, EZ Liu, Z Xiong, L Zintgraf… - arXiv preprint arXiv …, 2023 - arxiv.org

While deep reinforcement learning (RL) has fueled multiple high-profile successes in
machine learning, it is held back from more widespread adoption by its often poor data …

被引用次数：105 相关文章所有 2 个版本

[PDF] neurips.cc

Iq-learn: Inverse soft-q learning for imitation

D Garg, S Chakraborty, C Cundy… - Advances in Neural …, 2021 - proceedings.neurips.cc

In many sequential decision-making problems (eg, robotics control, game playing,
sequential prediction), human or expert data is available containing useful information about …

被引用次数：129 相关文章所有 9 个版本

[PDF] springer.com

A survey of inverse reinforcement learning

S Adams, T Cody, PA Beling - Artificial Intelligence Review, 2022 - Springer

Learning from demonstration, or imitation learning, is the process of learning to act in an
environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a …

被引用次数：75 相关文章所有 8 个版本

[PDF] neurips.cc

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

Generalized decision transformer for offline hindsight information matching

H Furuta, Y Matsuo, SS Gu - arXiv preprint arXiv:2111.10364, 2021 - arxiv.org

How to extract as much learning signal from each trajectory data has been a key problem in
reinforcement learning (RL), where sample inefficiency has posed serious challenges for …

被引用次数：85 相关文章所有 5 个版本

[PDF] mlr.press

Path planning using neural a* search

R Yonetani, T Taniai, M Barekatain… - International …, 2021 - proceedings.mlr.press

We present Neural A*, a novel data-driven search method for path planning problems.
Despite the recent increasing attention to data-driven path planning, machine learning …

被引用次数：86 相关文章所有 6 个版本

[PDF] neurips.cc

Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters

K Ghasemipour, SS Gu… - Advances in Neural …, 2022 - proceedings.neurips.cc

Motivated by the success of ensembles for uncertainty estimation in supervised learning, we
take a renewed look at how ensembles of $ Q $-functions can be leveraged as the primary …

被引用次数：46 相关文章所有 6 个版本

[PDF] mdpi.com

Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems

J Ruiz-Serra, MS Harré - Algorithms, 2023 - mdpi.com

Theory of mind (ToM) is the psychological construct by which we model another's internal
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …

被引用次数：8 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] Hard choices in artificial intelligence

R Dobbe, TK Gilbert, Y Mintz - Artificial Intelligence, 2021 - Elsevier

As AI systems are integrated into high stakes social domains, researchers now examine how
to design and operate them in a safe and ethical manner. However, the criteria for identifying …

被引用次数：66 相关文章所有 11 个版本

[PDF] thecvf.com

Procedure planning in instructional videos via contextual modeling and model-based policy learning

J Bi, J Luo, C Xu - … of the IEEE/CVF International Conference …, 2021 - openaccess.thecvf.com

Learning new skills by observing humans' behaviors is an essential capability of AI. In this
work, we leverage instructional videos to study humans' decision-making processes …

被引用次数：39 相关文章所有 7 个版本