A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

When is realizability sufficient for off-policy reinforcement learning?

A Zanette - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Understanding when reinforcement learning algorithms can make successful off-policy
predictions—and when the may fail to do so–remains an open problem. Typically, model …

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory

R Zhang, X Zhang, C Ni… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc
Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

Beyond the return: Off-policy function estimation under user-specified error-measuring distributions

A Huang, N Jiang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Off-policy evaluation often refers to two related tasks: estimating the expected return of a
policy and estimating its value function (or other functions of interest, such as density ratios) …

Robust offline policy evaluation and optimization with heavy-tailed rewards

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

Off-policy evaluation in doubly inhomogeneous environments

Z Bian, C Shi, Z Qi, L Wang - Journal of the American Statistical …, 2024 - Taylor & Francis
This work aims to study off-policy evaluation (OPE) under scenarios where two key
reinforcement learning (RL) assumptions–temporal stationarity and individual homogeneity …

The optimal approximation factors in misspecified off-policy value function estimation

P Amortila, N Jiang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative
blow-up factors with respect to the misspecification error of function approximation. Yet, the …

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Y Zhang, N Jiang - arXiv preprint arXiv:2402.14703, 2024 - arxiv.org
We study off-policy evaluation (OPE) in partially observable environments with complex
observations, with the goal of developing estimators whose guarantee avoids exponential …