Distributional pareto-optimal multi-objective reinforcement learning
Multi-objective reinforcement learning (MORL) has been proposed to learn control policies
over multiple competing objectives with each possible preference over returns. However …
over multiple competing objectives with each possible preference over returns. However …
Imitation learning from vague feedback
Imitation learning from human feedback studies how to train well-performed imitation agents
with an annotator's relative comparison of two demonstrations (one demonstration is …
with an annotator's relative comparison of two demonstrations (one demonstration is …
Error bounds of imitating policies and environments for reinforcement learning
In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking
expert demonstrations. Various imitation methods were proposed and empirically evaluated …
expert demonstrations. Various imitation methods were proposed and empirically evaluated …
Unlabeled imperfect demonstrations in adversarial imitation learning
Adversarial imitation learning has become a widely used imitation learning framework. The
discriminator is often trained by taking expert demonstrations and policy trajectories as …
discriminator is often trained by taking expert demonstrations and policy trajectories as …
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
Imitation learning learns a policy from expert trajectories. While the expert data is believed to
be crucial for imitation quality, it was found that a kind of imitation learning approach …
be crucial for imitation quality, it was found that a kind of imitation learning approach …
Imitation learning from purified demonstration
Imitation learning has emerged as a promising approach for addressing sequential decision-
making problems, with the assumption that expert demonstrations are optimal. However, in …
making problems, with the assumption that expert demonstrations are optimal. However, in …
Seeing differently, acting similarly: Heterogeneously observable imitation learning
In many real-world imitation learning tasks, the demonstrator and the learner have to act
under different observation spaces. This situation brings significant obstacles to existing …
under different observation spaces. This situation brings significant obstacles to existing …
[PDF][PDF] Anomaly guided policy learning from imperfect demonstrations
Reinforcement Learning (RL)[38] has been widely used in many challenging sequential
decision-making tasks, such as autonomous vehicle [11, 21, 34], video game playing [3, 8 …
decision-making tasks, such as autonomous vehicle [11, 21, 34], video game playing [3, 8 …
Visual Imitation Learning with Calibrated Contrastive Representation
Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-
dimensional states and actions. However, challenges arise in handling visual states due to …
dimensional states and actions. However, challenges arise in handling visual states due to …
Reinforcement learning from bagged reward
In Reinforcement Learning (RL), it is commonly assumed that an immediate reward signal is
generated for each action taken by the agent, helping the agent maximize cumulative …
generated for each action taken by the agent, helping the agent maximize cumulative …