相关文章- 学术资源搜索

Quantile off-policy evaluation via deep conditional generative learning

Y Xu, C Shi, S Luo, L Wang, R Song - arXiv preprint arXiv:2212.14466, 2022 - arxiv.org

Off-Policy evaluation (OPE) is concerned with evaluating a new target policy using offline
data generated by a potentially different behavior policy. It is critical in a number of …

被引用次数：5 相关文章所有 2 个版本

[PDF] mlr.press

Off-policy evaluation in infinite-horizon reinforcement learning with latent confounders

A Bennett, N Kallus, L Li… - … Conference on Artificial …, 2021 - proceedings.mlr.press

Off-policy evaluation (OPE) in reinforcement learning is an important problem in settings
where experimentation is limited, such as healthcare. But, in these very same settings …

被引用次数：48 相关文章所有 7 个版本

[PDF] mlr.press

Off-policy evaluation for large action spaces via conjunct effect modeling

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

被引用次数：14 相关文章所有 7 个版本

[PDF] neurips.cc

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

H Lee, J Lee, Y Choi, W Jeon, BJ Lee… - Advances in …, 2022 - proceedings.neurips.cc

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …

被引用次数：3 相关文章所有 8 个版本

[PDF] neurips.cc

Control variates for slate off-policy evaluation

N Vlassis, A Chandrashekar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study the problem of off-policy evaluation from batched contextual bandit data with
multidimensional actions, often termed slates. The problem is common to recommender …

被引用次数：7 相关文章所有 9 个版本

[PDF] mlr.press

Balanced off-policy evaluation in general action spaces

A Sondhi, D Arbour, D Dimmery - … Conference on Artificial …, 2020 - proceedings.mlr.press

Estimation of importance sampling weights for off-policy evaluation of contextual bandits
often results in imbalance—a mismatch between the desired and the actual distribution of …

被引用次数：18 相关文章所有 5 个版本

[PDF] aaai.org

Policy-adaptive estimator selection for off-policy evaluation

T Udagawa, H Kiyohara, Y Narita, Y Saito… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …

被引用次数：13 相关文章所有 4 个版本

[PDF] mlr.press

State relevance for off-policy evaluation

SP Shen, Y Ma, O Gottesman… - … on Machine Learning, 2021 - proceedings.mlr.press

Importance sampling-based estimators for off-policy evaluation (OPE) are valued for their
simplicity, unbiasedness, and reliance on relatively few assumptions. However, the variance …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Towards Soft Fairness in Restless Multi-Armed Bandits

D Li, P Varakantham - arXiv preprint arXiv:2207.13343, 2022 - arxiv.org

Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under
uncertainty. It is an extremely useful model for monitoring beneficiaries and executing timely …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Doubly-Robust Off-Policy Evaluation with Estimated Logging Policy

K Lee, MC Paik - arXiv preprint arXiv:2404.01830, 2024 - arxiv.org

We introduce a novel doubly-robust (DR) off-policy evaluation (OPE) estimator for Markov
decision processes, DRUnknown, designed for situations where both the logging policy and …