相关文章- 学术资源搜索

Towards robust off-policy evaluation via human inputs

H Singh, S Joshi, F Doshi-Velez… - Proceedings of the 2022 …, 2022 - dl.acm.org

Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes
domains such as healthcare, where direct deployment is often infeasible, unethical, or …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

Triply robust off-policy evaluation

A Liu, H Liu, A Anandkumar, Y Yue - arXiv preprint arXiv:1911.05811, 2019 - arxiv.org

We propose a robust regression approach to off-policy evaluation (OPE) for contextual
bandits. We frame OPE as a covariate-shift problem and leverage modern robust regression …

被引用次数：9 相关文章所有 8 个版本

[PDF] mlr.press

Data poisoning attacks on off-policy policy evaluation methods

E Lobo, H Singh, M Petrik, C Rudin… - Uncertainty in …, 2022 - proceedings.mlr.press

Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes
domains such as healthcare, where exploration is often infeasible, unethical, or expensive …

When is Off-Policy Evaluation Useful? A Data-Centric Perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Off-policy evaluation via adaptive weighting with data from contextual bandits

R Zhan, V Hadad, DA Hirshberg, S Athey - Proceedings of the 27th ACM …, 2021 - dl.acm.org

It has become increasingly common for data to be collected adaptively, for example using
contextual bandits. Historical data of this type can be used to evaluate other treatment …

被引用次数：47 相关文章所有 5 个版本

[PDF] arxiv.org

Open bandit dataset and pipeline: Towards realistic and reproducible off-policy evaluation

Y Saito, S Aihara, M Matsutani, Y Narita - arXiv preprint arXiv:2008.07146, 2020 - arxiv.org

Off-policy evaluation (OPE) aims to estimate the performance of hypothetical policies using
data generated by a different policy. Because of its huge potential impact in practice, there …

被引用次数：68 相关文章所有 7 个版本

[PDF] arxiv.org

Off-policy evaluation of slate bandit policies via optimizing abstraction

H Kiyohara, M Nomura, Y Saito - arXiv preprint arXiv:2402.02171, 2024 - arxiv.org

We study off-policy evaluation (OPE) in the problem of slate contextual bandits where a
policy selects multi-dimensional actions known as slates. This problem is widespread in …

被引用次数：3 相关文章所有 2 个版本

[PDF] mlr.press

Adaptive estimator selection for off-policy evaluation

Y Su, P Srinath… - … Conference on Machine …, 2020 - proceedings.mlr.press

We develop a generic data-driven method for estimator selection in off-policy policy
evaluation settings. We establish a strong performance guarantee for the method, showing …

被引用次数：39 相关文章所有 5 个版本

[PDF] arxiv.org

Doubly robust policy evaluation and learning

M Dudík, J Langford, L Li - arXiv preprint arXiv:1103.4601, 2011 - arxiv.org

We study decision making in environments where the reward is only partially observed, but
can be modeled as a function of an action and an observed context. This setting, known as …

被引用次数：907 相关文章所有 18 个版本

[PDF] mlr.press

Optimal and adaptive off-policy evaluation in contextual bandits

YX Wang, A Agarwal, M Dudık - International Conference on …, 2017 - proceedings.mlr.press

We study the off-policy evaluation problem—estimating the value of a target policy using
data collected by another policy—under the contextual bandit model. We consider the …

被引用次数：221 相关文章所有 7 个版本