A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
When is realizability sufficient for off-policy reinforcement learning?
A Zanette - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Understanding when reinforcement learning algorithms can make successful off-policy
predictions—and when the may fail to do so–remains an open problem. Typically, model …
predictions—and when the may fail to do so–remains an open problem. Typically, model …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …
Loss dynamics of temporal difference reinforcement learning
Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …
have to learn to act in environments with sparse feedback. However, despite this empirical …
Beyond the return: Off-policy function estimation under user-specified error-measuring distributions
Off-policy evaluation often refers to two related tasks: estimating the expected return of a
policy and estimating its value function (or other functions of interest, such as density ratios) …
policy and estimating its value function (or other functions of interest, such as density ratios) …
Robust offline policy evaluation and optimization with heavy-tailed rewards
This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …
Off-policy evaluation in doubly inhomogeneous environments
Z Bian, C Shi, Z Qi, L Wang - Journal of the American Statistical …, 2024 - Taylor & Francis
This work aims to study off-policy evaluation (OPE) under scenarios where two key
reinforcement learning (RL) assumptions–temporal stationarity and individual homogeneity …
reinforcement learning (RL) assumptions–temporal stationarity and individual homogeneity …
The optimal approximation factors in misspecified off-policy value function estimation
P Amortila, N Jiang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative
blow-up factors with respect to the misspecification error of function approximation. Yet, the …
blow-up factors with respect to the misspecification error of function approximation. Yet, the …
On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation
We study off-policy evaluation (OPE) in partially observable environments with complex
observations, with the goal of developing estimators whose guarantee avoids exponential …
observations, with the goal of developing estimators whose guarantee avoids exponential …