A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

Representation learning for online and offline rl in low-rank mdps

M Uehara, X Zhang, W Sun - arXiv preprint arXiv:2110.04652, 2021 - arxiv.org
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …

Pessimistic model-based offline reinforcement learning under partial coverage

M Uehara, W Sun - arXiv preprint arXiv:2107.06226, 2021 - arxiv.org
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency

M Uehara, M Imaizumi, N Jiang, N Kallus… - arXiv preprint arXiv …, 2021 - arxiv.org
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement
learning using function approximation for marginal importance weights and $ q $-functions …

Provable benefits of representational transfer in reinforcement learning

A Agarwal, Y Song, W Sun, K Wang… - The Thirty Sixth …, 2023 - proceedings.mlr.press
We study the problem of representational transfer in RL, where an agent first pretrains in a
number of\emph {source tasks} to discover a shared representation, which is subsequently …

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

Bootstrapping fitted q-evaluation for off-policy inference

B Hao, X Ji, Y Duan, H Lu… - International …, 2021 - proceedings.mlr.press
Bootstrapping provides a flexible and effective approach for assessing the quality of batch
reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …

Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory

R Zhang, X Zhang, C Ni… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …