Towards robust off-policy evaluation via human inputs
Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes
domains such as healthcare, where direct deployment is often infeasible, unethical, or …
domains such as healthcare, where direct deployment is often infeasible, unethical, or …
Triply robust off-policy evaluation
We propose a robust regression approach to off-policy evaluation (OPE) for contextual
bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression …
bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression …
Data poisoning attacks on off-policy policy evaluation methods
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes
domains such as healthcare, where exploration is often infeasible, unethical, or expensive …
domains such as healthcare, where exploration is often infeasible, unethical, or expensive …
When is Off-Policy Evaluation Useful? A Data-Centric Perspective
Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …
but challenging. On the one hand, it brings opportunities for safe policy improvement under …
Off-policy evaluation via adaptive weighting with data from contextual bandits
It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …
contextual bandits. Historical data of this type can be used to evaluate other treatment …
Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using
data generated by a different policy. Because of its huge potential impact in practice, there …
data generated by a different policy. Because of its huge potential impact in practice, there …
Off-policy evaluation of slate bandit policies via optimizing abstraction
We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a
policy selects multi-dimensional actions known as slates. This problem is widespread in …
policy selects multi-dimensional actions known as slates. This problem is widespread in …
Adaptive estimator selection for off-policy evaluation
Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press
We develop a generic data-driven method for estimator selection in off-policy policy
evaluation settings. We establish a strong performance guarantee for the method, showing …
evaluation settings. We establish a strong performance guarantee for the method, showing …
Doubly robust policy evaluation and learning
We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …
can be modeled as a function of an action and an observed context. This setting, known as …
Optimal and adaptive off-policy evaluation in contextual bandits
We study the off-policy evaluation problem—estimating the value of a target policy using
data collected by another policy—under the contextual bandit model. We consider the …
data collected by another policy—under the contextual bandit model. We consider the …