Policy-adaptive estimator selection for off-policy evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …
policies using only offline logged data. Although many estimators have been developed …
Off-policy evaluation with deficient support using side information
N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc
Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …
new policies from the data collected by another one. OPE is crucial when evaluating a new …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Adaptive estimator selection for off-policy evaluation
Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press
We develop a generic data-driven method for estimator selection in off-policy policy
evaluation settings. We establish a strong performance guarantee for the method, showing …
evaluation settings. We establish a strong performance guarantee for the method, showing …
Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …
Accountable off-policy evaluation with kernel bellman statistics
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy
from observed data collected from previous experiments, without requiring the execution of …
from observed data collected from previous experiments, without requiring the execution of …
Efficiently breaking the curse of horizon in off-policy evaluation with double reinforcement learning
Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long-and
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …
infinite-horizon settings due to diminishing overlap between behavior and target policies. In …
Using options and covariance testing for long horizon off-policy policy evaluation
Evaluating a policy by deploying it in the real world can be risky and costly. Off-policy policy
evaluation (OPE) algorithms use historical data collected from running a previous policy to …
evaluation (OPE) algorithms use historical data collected from running a previous policy to …
Understanding the curse of horizon in off-policy evaluation via conditional importance sampling
Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …
in long-horizon domains, and there has been particular excitement over new IS methods that …
Minimax value interval for off-policy evaluation and policy optimization
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …
marginalized importance weights. Despite that they hold promises of overcoming the …