Towards robust off-policy evaluation via human inputs

H Singh, S Joshi, F Doshi-Velez… - Proceedings of the 2022 …, 2022 - dl.acm.org
Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes
domains such as healthcare, where direct deployment is often infeasible, unethical, or …

Triply robust off-policy evaluation

A Liu, H Liu, A Anandkumar, Y Yue - arXiv preprint arXiv:1911.05811, 2019 - arxiv.org
We propose a robust regression approach to off-policy evaluation (OPE) for contextual
bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression …

Data poisoning attacks on off-policy policy evaluation methods

E Lobo, H Singh, M Petrik, C Rudin… - Uncertainty in …, 2022 - proceedings.mlr.press
Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes
domains such as healthcare, where exploration is often infeasible, unethical, or expensive …

When is Off-Policy Evaluation Useful? A Data-Centric Perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org
It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation

Y Saito, S Aihara, M Matsutani, Y Narita - arXiv preprint arXiv:2008.07146, 2020 - arxiv.org
Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using
data generated by a different policy. Because of its huge potential impact in practice, there …

Off-policy evaluation of slate bandit policies via optimizing abstraction

H Kiyohara, M Nomura, Y Saito - arXiv preprint arXiv:2402.02171, 2024 - arxiv.org
We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a
policy selects multi-dimensional actions known as slates. This problem is widespread in …

Adaptive estimator selection for off-policy evaluation

Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press
We develop a generic data-driven method for estimator selection in off-policy policy
evaluation settings. We establish a strong performance guarantee for the method, showing …

Doubly robust policy evaluation and learning

M Dudík, J Langford, L Li - arXiv preprint arXiv:1103.4601, 2011 - arxiv.org
We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …

Optimal and adaptive off-policy evaluation in contextual bandits

YX Wang, A Agarwal, M Dudık - International Conference on …, 2017 - proceedings.mlr.press
We study the off-policy evaluation problem—estimating the value of a target policy using
data collected by another policy—under the contextual bandit model. We consider the …