f-irl: Inverse reinforcement learning via state marginal matching

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：134 相关文章所有 3 个版本

[PDF] neurips.cc

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

被引用次数：13 相关文章所有 5 个版本

[PDF] springer.com

Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning

V Kukreja - Complex & Intelligent Systems, 2023 - Springer

Reinforcement learning (RL) techniques nurture building up solutions for sequential
decision-making problems under uncertainty and ambiguity. RL has agents with a reward …

被引用次数：56 相关文章所有 4 个版本

[PDF] mdpi.com

Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems

J Ruiz-Serra, MS Harré - Algorithms, 2023 - mdpi.com

Theory of mind (ToM) is the psychological construct by which we model another's internal
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …

被引用次数：9 相关文章所有 4 个版本

[PDF] neurips.cc

Maximum-likelihood inverse reinforcement learning with finite-time guarantees

S Zeng, C Li, A Garcia, M Hong - Advances in Neural …, 2022 - proceedings.neurips.cc

Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …

被引用次数：30 相关文章所有 10 个版本

[PDF] mlr.press

Inverse decision modeling: Learning interpretable representations of behavior

D Jarrett, A Hüyük… - … Conference on Machine …, 2021 - proceedings.mlr.press

Decision analysis deals with modeling and enhancing decision processes. A principal
challenge in improving behavior is in obtaining a transparent* description* of existing …

被引用次数：31 相关文章所有 7 个版本

[PDF] neurips.cc

State regularized policy optimization on data with dynamics shift

Z Xue, Q Cai, S Liu, D Zheng… - Advances in neural …, 2024 - proceedings.neurips.cc

In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data
with dynamics shift, ie, with different underlying environment dynamics. A majority of current …

被引用次数：8 相关文章所有 7 个版本

[PDF] arxiv.org

Rl-vlm-f: Reinforcement learning from vision language foundation model feedback

Y Wang, Z Sun, J Zhang, Z Xian, E Biyik, D Held… - arXiv preprint arXiv …, 2024 - arxiv.org

Reward engineering has long been a challenge in Reinforcement Learning (RL) research,
as it often requires extensive human effort and iterative processes of trial-and-error to design …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

Dual rl: Unification and new methods for reinforcement and imitation learning

H Sikchi, Q Zheng, A Zhang, S Niekum - arXiv preprint arXiv:2302.08560, 2023 - arxiv.org

The goal of reinforcement learning (RL) is to find a policy that maximizes the expected
cumulative return. It has been shown that this objective can be represented as an …

被引用次数：19 相关文章所有 5 个版本

[PDF] neurips.cc

Adversarial intrinsic motivation for reinforcement learning

I Durugkar, M Tec, S Niekum… - Advances in Neural …, 2021 - proceedings.neurips.cc

Learning with an objective to minimize the mismatch with a reference distribution has been
shown to be useful for generative modeling and imitation learning. In this paper, we …

被引用次数：37 相关文章所有 10 个版本