Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …
a fixed dataset without active data collection. Based on the composition of the offline dataset …
Offline reinforcement learning with realizability and single-policy concentrability
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …
assumptions on both the function classes (eg, Bellman-completeness) and the data …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining
Large transformer models pretrained on offline reinforcement learning datasets have
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …
demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they …
Mitigating covariate shift in imitation learning via offline data with partial coverage
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …
demonstrator without additional online environment interactions. Instead, the learner is …
Should i run offline reinforcement learning or behavioral cloning?
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing only
previously collected experience, without any online interaction. While it is widely understood …
previously collected experience, without any online interaction. While it is widely understood …
When should we prefer offline reinforcement learning over behavioral cloning?
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …
previously collected experience, without any online interaction. It is widely understood that …
Imitation learning from imperfection: Theoretical justifications and algorithms
Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for
sequential decision-making tasks. But, their effectiveness is hampered when faced with …
sequential decision-making tasks. But, their effectiveness is hampered when faced with …
Welfare maximization in competitive equilibrium: Reinforcement learning for markov exchange economy
We study a bilevel economic system, which we refer to as a Markov exchange economy
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …
(MEE), from the point of view of multi-agent reinforcement learning (MARL). An MEE …