Iq-learn: Inverse soft-q learning for imitation
In many sequential decision-making problems (eg, robotics control, game playing,
sequential prediction), human or expert data is available containing useful information about …
sequential prediction), human or expert data is available containing useful information about …
Maximum-likelihood inverse reinforcement learning with finite-time guarantees
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated
optimal policy that best fits observed sequences of states and actions implemented by an …
optimal policy that best fits observed sequences of states and actions implemented by an …
Feedback in imitation learning: The three regimes of covariate shift
Imitation learning practitioners have often noted that conditioning policies on previous
actions leads to a dramatic divergence between" held out" error and performance of the …
actions leads to a dramatic divergence between" held out" error and performance of the …
Inverse decision modeling: Learning interpretable representations of behavior
Decision analysis deals with modeling and enhancing decision processes. A principal
challenge in improving behavior is in obtaining a transparent* description* of existing …
challenge in improving behavior is in obtaining a transparent* description* of existing …
Coherent soft imitation learning
Imitation learning methods seek to learn from an expert either through behavioral cloning
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …
(BC) for the policy or inverse reinforcement learning (IRL) for the reward. Such methods …
Proximal point imitation learning
This work develops new algorithms with rigorous efficiency guarantees for infinite horizon
imitation learning (IL) with linear function approximation without restrictive coherence …
imitation learning (IL) with linear function approximation without restrictive coherence …
Sequencematch: Imitation learning for autoregressive sequence modelling with backtracking
In many domains, autoregressive models can attain high likelihood on the task of predicting
the next observation. However, this maximum-likelihood (MLE) objective does not …
the next observation. However, this maximum-likelihood (MLE) objective does not …
A model-based solution to the offline multi-agent reinforcement learning coordination problem
Training multiple agents to coordinate is an essential problem with applications in robotics,
game theory, economics, and social sciences. However, most existing Multi-Agent …
game theory, economics, and social sciences. However, most existing Multi-Agent …
Diffusion imitation from observation
Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …
demonstrations without requiring action labels. Existing adversarial imitation learning …
[HTML][HTML] All by myself: learning individualized competitive behavior with a contrastive reinforcement learning optimization
In a competitive game scenario, a set of agents have to learn decisions that maximize their
goals and minimize their adversaries' goals at the same time. Besides dealing with the …
goals and minimize their adversaries' goals at the same time. Besides dealing with the …