Offline reinforcement learning as one big sequence modeling problem
Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …
a fixed dataset without active data collection. Based on the composition of the offline dataset …
Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Policy finetuning: Bridging sample-efficient offline and online reinforcement learning
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …
two settings: learning interactively in the environment (online RL), or learning from an offline …
Representation learning for online and offline rl in low-rank mdps
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …
compact low-dimensional representation such that on top of the representation we can …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
The curious price of distributional robustness in reinforcement learning with a generative model
This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …
Leveraging offline data in online reinforcement learning
A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
Towards instance-optimal offline reinforcement learning with pessimism
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
Mitigating covariate shift in imitation learning via offline data with partial coverage
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …
demonstrator without additional online environment interactions. Instead, the learner is …