Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Ceil: Generalized contextual imitation learning
In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …
Hybrid fuzzy AHP–TOPSIS approach to prioritizing solutions for inverse reinforcement learning
V Kukreja - Complex & Intelligent Systems, 2023 - Springer
Reinforcement learning (RL) techniques nurture building up solutions for sequential
decision-making problems under uncertainty and ambiguity. RL has agents with a reward …
decision-making problems under uncertainty and ambiguity. RL has agents with a reward …
Inverse reinforcement learning as the algorithmic basis for theory of mind: current methods and open problems
J Ruiz-Serra, MS Harré - Algorithms, 2023 - mdpi.com
Theory of mind (ToM) is the psychological construct by which we model another's internal
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …
mental states. Through ToM, we adjust our own behaviour to best suit a social context, and …
Maximum-likelihood inverse reinforcement learning with finite-time guarantees
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …
optimal policy that best fits observed sequences of states and actions implemented by an …
Inverse decision modeling: Learning interpretable representations of behavior
Decision analysis deals with modeling and enhancing decision processes. A principal
challenge in improving behavior is in obtaining a transparent* description* of existing …
challenge in improving behavior is in obtaining a transparent* description* of existing …
State regularized policy optimization on data with dynamics shift
In many real-world scenarios, Reinforcement Learning (RL) algorithms are trained on data
with dynamics shift, ie, with different underlying environment dynamics. A majority of current …
with dynamics shift, ie, with different underlying environment dynamics. A majority of current …
Rl-vlm-f: Reinforcement learning from vision language foundation model feedback
Reward engineering has long been a challenge in Reinforcement Learning (RL) research,
as it often requires extensive human effort and iterative processes of trial-and-error to design …
as it often requires extensive human effort and iterative processes of trial-and-error to design …
Dual rl: Unification and new methods for reinforcement and imitation learning
The goal of reinforcement learning (RL) is to find a policy that maximizes the expected
cumulative return. It has been shown that this objective can be represented as an …
cumulative return. It has been shown that this objective can be represented as an …
Adversarial intrinsic motivation for reinforcement learning
Learning with an objective to minimize the mismatch with a reference distribution has been
shown to be useful for generative modeling and imitation learning. In this paper, we …
shown to be useful for generative modeling and imitation learning. In this paper, we …