Toward the fundamental limits of imitation learning

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

被引用次数：307 相关文章所有 8 个版本

[PDF] mlr.press

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

被引用次数：127 相关文章所有 6 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：103 相关文章所有 10 个版本

[PDF] arxiv.org

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arXiv preprint arXiv:2107.06226, 2021 - arxiv.org

We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

被引用次数：164 相关文章所有 4 个版本

[PDF] arxiv.org

Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining

L Lin, Y Bai, S Mei - arXiv preprint arXiv:2310.08566, 2023 - arxiv.org

Large transformer models pretrained on offline reinforcement learning datasets have
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …

被引用次数：44 相关文章所有 4 个版本

[PDF] neurips.cc

Mitigating covariate shift in imitation learning via offline data with partial coverage

J Chang, M Uehara, D Sreenivas… - Advances in Neural …, 2021 - proceedings.neurips.cc

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …

被引用次数：103 相关文章所有 7 个版本

[PDF] openreview.net

Should i run offline reinforcement learning or behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - International Conference on …, 2021 - openreview.net

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …

被引用次数：78 相关文章所有 2 个版本

[PDF] arxiv.org

When should we prefer offline reinforcement learning over behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - arXiv preprint arXiv:2204.05618, 2022 - arxiv.org

Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …

被引用次数：67 相关文章所有 2 个版本

[PDF] neurips.cc

Imitation learning from imperfection: Theoretical justifications and algorithms

Z Li, T Xu, Z Qin, Y Yu, ZQ Luo - Advances in Neural …, 2024 - proceedings.neurips.cc

Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for
sequential decision-making tasks. But, their effectiveness is hampered when faced with …

被引用次数：8 相关文章所有 3 个版本

[PDF] mlr.press

Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy

Z Liu, M Lu, Z Wang, M Jordan… - … Conference on Machine …, 2022 - proceedings.mlr.press

We study a bilevel economic system, which we refer to as a Markov exchange economy
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …

被引用次数：23 相关文章所有 8 个版本