A survey of meta-reinforcement learning

J Beck, R Vuorio, EZ Liu, Z Xiong, L Zintgraf… - arXiv preprint arXiv …, 2023 - arxiv.org
While deep reinforcement learning (RL) has fueled multiple high-profile successes in
machine learning, it is held back from more widespread adoption by its often poor data …

Iq-learn: Inverse soft-q learning for imitation

D Garg, S Chakraborty, C Cundy… - Advances in Neural …, 2021 - proceedings.neurips.cc
In many sequential decision-making problems (eg, robotics control, game playing,
sequential prediction), human or expert data is available containing useful information about …

A survey of inverse reinforcement learning

S Adams, T Cody, PA Beling - Artificial Intelligence Review, 2022 - Springer
Learning from demonstration, or imitation learning, is the process of learning to act in an
environment from examples provided by a teacher. Inverse reinforcement learning (IRL) is a …

Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning

R Liu, F Bai, Y Du, Y Yang - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Setting up a well-designed reward function has been challenging for many
reinforcement learning applications. Preference-based reinforcement learning (PbRL) …

Generalized decision transformer for offline hindsight information matching

H Furuta, Y Matsuo, SS Gu - arXiv preprint arXiv:2111.10364, 2021 - arxiv.org
How to extract as much learning signal from each trajectory data has been a key problem in
reinforcement learning (RL), where sample inefficiency has posed serious challenges for …

Path planning using neural a* search

R Yonetani, T Taniai, M Barekatain… - International …, 2021 - proceedings.mlr.press
We present Neural A*, a novel data-driven search method for path planning problems.
Despite the recent increasing attention to data-driven path planning, machine learning …

Why so pessimistic? estimating uncertainties for offline rl through ensembles, and why their independence matters

K Ghasemipour, SS Gu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Motivated by the success of ensembles for uncertainty estimation in supervised learning, we
take a renewed look at how ensembles of $ Q $-functions can be leveraged as the primary …

Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems

J Ruiz-Serra, MS Harré - Algorithms, 2023 - mdpi.com
Theory of mind (ToM) is the psychological construct by which we model another's internal
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …

[HTML][HTML] Hard choices in artificial intelligence

R Dobbe, TK Gilbert, Y Mintz - Artificial Intelligence, 2021 - Elsevier
As AI systems are integrated into high stakes social domains, researchers now examine how
to design and operate them in a safe and ethical manner. However, the criteria for identifying …

Procedure planning in instructional videos via contextual modeling and model-based policy learning

J Bi, J Luo, C Xu - … of the IEEE/CVF International Conference …, 2021 - openaccess.thecvf.com
Learning new skills by observing humans' behaviors is an essential capability of AI. In this
work, we leverage instructional videos to study humans' decision-making processes …