A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
Provable benefits of actor-critic methods for offline reinforcement learning
A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Towards instance-optimal offline reinforcement learning with pessimism
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
Double reinforcement learning for efficient off-policy evaluation in markov decision processes
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
Minimax-optimal off-policy evaluation with linear function approximation
This paper studies the statistical theory of off-policy evaluation with function approximation in
batch data reinforcement learning problem. We consider a regression-based fitted Q …
batch data reinforcement learning problem. We consider a regression-based fitted Q …
Mitigating covariate shift in imitation learning via offline data with partial coverage
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …
demonstrator without additional online environment interactions. Instead, the learner is …
Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …
sequential decision-making strategies, has gained surging prominence in recent studies …
Reinforcement learning with guarantees: a review
Reinforcement learning is concerned with a generic concept of an agent acting in an
environment. From the control theory standpoint, reinforcement learning may be considered …
environment. From the control theory standpoint, reinforcement learning may be considered …
Near-optimal offline reinforcement learning via double variance reduction
We consider the problem of offline reinforcement learning (RL)---a well-motivated setting of
RL that aims at policy optimization using only historical data. Despite its wide applicability …
RL that aims at policy optimization using only historical data. Despite its wide applicability …