Behaviour policy estimation in off-policy policy evaluation: Calibration matters
In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy
Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of …
Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of …
Deeply-debiased off-policy interval estimation
Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …
different behavior policy. In addition to a point estimate, many applications would benefit …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Accountable off-policy evaluation with kernel bellman statistics
We consider off-policy evaluation (OPE), which evaluates the performance of a new policy
from observed data collected from previous experiments, without requiring the execution of …
from observed data collected from previous experiments, without requiring the execution of …
When is off-policy evaluation useful? a data-centric perspective
Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …
but challenging. On the one hand, it brings opportunities for safe policy improvement under …
Off-policy evaluation with deficient support using side information
N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc
Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …
new policies from the data collected by another one. OPE is crucial when evaluating a new …
Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds
Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy
based on offline data previously collected under different policies. Therefore, OPE is a key …
based on offline data previously collected under different policies. Therefore, OPE is a key …
Policy-adaptive estimator selection for off-policy evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …
policies using only offline logged data. Although many estimators have been developed …
Understanding the curse of horizon in off-policy evaluation via conditional importance sampling
Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …
in long-horizon domains, and there has been particular excitement over new IS methods that …
Distributional shift-aware off-policy interval estimation: A unified error quantification framework
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …
decision processes, where the objective is to establish a confidence interval (CI) for the …