A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

Minimax-optimal off-policy evaluation with linear function approximation

Y Duan, Z Jia, M Wang - International Conference on …, 2020 - proceedings.mlr.press
This paper studies the statistical theory of off-policy evaluation with function approximation in
batch data reinforcement learning problem. We consider a regression-based fitted Q …

Mitigating covariate shift in imitation learning via offline data with partial coverage

J Chang, M Uehara, D Sreenivas… - Advances in Neural …, 2021 - proceedings.neurips.cc
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …

Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arXiv preprint arXiv:2203.05804, 2022 - arxiv.org
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

Reinforcement learning with guarantees: a review

P Osinenko, D Dobriborsci, W Aumer - IFAC-PapersOnLine, 2022 - Elsevier
Reinforcement learning is concerned with a generic concept of an agent acting in an
environment. From the control theory standpoint, reinforcement learning may be considered …

Near-optimal offline reinforcement learning via double variance reduction

M Yin, Y Bai, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We consider the problem of offline reinforcement learning (RL)---a well-motivated setting of
RL that aims at policy optimization using only historical data. Despite its wide applicability …