Distributional pareto-optimal multi-objective reinforcement learning

XQ Cai, P Zhang, L Zhao, J Bian… - Advances in …, 2024 - proceedings.neurips.cc
Multi-objective reinforcement learning (MORL) has been proposed to learn control policies
over multiple competing objectives with each possible preference over returns. However …

Imitation learning from vague feedback

XQ Cai, YJ Zhang, CK Chiang… - Advances in Neural …, 2023 - proceedings.neurips.cc
Imitation learning from human feedback studies how to train well-performed imitation agents
with an annotator's relative comparison of two demonstrations (one demonstration is …

Error bounds of imitating policies and environments for reinforcement learning

T Xu, Z Li, Y Yu - IEEE Transactions on Pattern Analysis and …, 2021 - ieeexplore.ieee.org
In sequential decision-making, imitation learning (IL) trains a policy efficiently by mimicking
expert demonstrations. Various imitation methods were proposed and empirically evaluated …

Unlabeled imperfect demonstrations in adversarial imitation learning

Y Wang, B Du, C Xu - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Adversarial imitation learning has become a widely used imitation learning framework. The
discriminator is often trained by taking expert demonstrations and policy trajectories as …

Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis

T Xu, Z Li, Y Yu, ZQ Luo - arXiv preprint arXiv:2208.01899, 2022 - arxiv.org
Imitation learning learns a policy from expert trajectories. While the expert data is believed to
be crucial for imitation quality, it was found that a kind of imitation learning approach …

Imitation learning from purified demonstration

Y Wang, M Dong, B Du, C Xu - arXiv preprint arXiv:2310.07143, 2023 - arxiv.org
Imitation learning has emerged as a promising approach for addressing sequential decision-
making problems, with the assumption that expert demonstrations are optimal. However, in …

Seeing differently, acting similarly: Heterogeneously observable imitation learning

XQ Cai, YX Ding, ZX Chen, Y Jiang… - arXiv preprint arXiv …, 2021 - arxiv.org
In many real-world imitation learning tasks, the demonstrator and the learner have to act
under different observation spaces. This situation brings significant obstacles to existing …

[PDF][PDF] Anomaly guided policy learning from imperfect demonstrations

ZX Chen, XQ Cai, Y Jiang, ZH Zhou - Proceedings of the 21st …, 2022 - ifaamas.org
Reinforcement Learning (RL)[38] has been widely used in many challenging sequential
decision-making tasks, such as autonomous vehicle [11, 21, 34], video game playing [3, 8 …

Visual Imitation Learning with Calibrated Contrastive Representation

Y Wang, L Tao, B Du, Y Lin, C Xu - arXiv preprint arXiv:2401.11396, 2024 - arxiv.org
Adversarial Imitation Learning (AIL) allows the agent to reproduce expert behavior with low-
dimensional states and actions. However, challenges arise in handling visual states due to …

Reinforcement learning from bagged reward

Y Tang, XQ Cai, YX Ding, Q Wu, G Liu… - ICML 2024 Workshop …, 2024 - openreview.net
In Reinforcement Learning (RL), it is commonly assumed that an immediate reward signal is
generated for each action taken by the agent, helping the agent maximize cumulative …