A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
Bellman-consistent pessimism for offline reinforcement learning
The use of pessimism, when reasoning about datasets lacking exhaustive exploration has
recently gained prominence in offline reinforcement learning. Despite the robustness it adds …
recently gained prominence in offline reinforcement learning. Despite the robustness it adds …
Provable benefits of actor-critic methods for offline reinforcement learning
A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Offline rl without off-policy evaluation
D Brandfonbrener, W Whitney… - Advances in neural …, 2021 - proceedings.neurips.cc
Most prior approaches to offline reinforcement learning (RL) have taken an iterative actor-
critic approach involving off-policy evaluation. In this paper we show that simply doing one …
critic approach involving off-policy evaluation. In this paper we show that simply doing one …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Towards instance-optimal offline reinforcement learning with pessimism
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
How to leverage unlabeled data in offline reinforcement learning
Offline reinforcement learning (RL) can learn control policies from static datasets but, like
standard RL methods, it requires reward annotations for every transition. In many cases …
standard RL methods, it requires reward annotations for every transition. In many cases …
Mitigating covariate shift in imitation learning via offline data with partial coverage
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …
demonstrator without additional online environment interactions. Instead, the learner is …