Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arXiv preprint arXiv:2107.06226, 2021 - arxiv.org
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining

L Lin, Y Bai, S Mei - arXiv preprint arXiv:2310.08566, 2023 - arxiv.org
Large transformer models pretrained on offline reinforcement learning datasets have
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …

Mitigating covariate shift in imitation learning via offline data with partial coverage

J Chang, M Uehara, D Sreenivas… - Advances in Neural …, 2021 - proceedings.neurips.cc
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …

Should i run offline reinforcement learning or behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - International Conference on …, 2021 - openreview.net
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …

When should we prefer offline reinforcement learning over behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - arXiv preprint arXiv:2204.05618, 2022 - arxiv.org
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …

Imitation learning from imperfection: Theoretical justifications and algorithms

Z Li, T Xu, Z Qin, Y Yu, ZQ Luo - Advances in Neural …, 2024 - proceedings.neurips.cc
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for
sequential decision-making tasks. But, their effectiveness is hampered when faced with …

Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy

Z Liu, M Lu, Z Wang, M Jordan… - … Conference on Machine …, 2022 - proceedings.mlr.press
We study a bilevel economic system, which we refer to as a Markov exchange economy
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …