相关文章- 学术资源搜索

Behaviour policy estimation in off-policy policy evaluation: Calibration matters

A Raghu, O Gottesman, Y Liu, M Komorowski… - arXiv preprint arXiv …, 2018 - arxiv.org

In this work, we consider the problem of estimating a behaviour policy for use in Off-Policy
Policy Evaluation (OPE) when the true behaviour policy is unknown. Via a series of …

被引用次数：43 相关文章所有 6 个版本

[PDF] mlr.press

Deeply-debiased off-policy interval estimation

C Shi, R Wan, V Chernozhukov… - … conference on machine …, 2021 - proceedings.mlr.press

Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …

被引用次数：38 相关文章所有 5 个版本

[PDF] mlr.press

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

被引用次数：12 相关文章所有 7 个版本

[PDF] mlr.press

Accountable off-policy evaluation with kernel bellman statistics

Y Feng, T Ren, Z Tang, Q Liu - International Conference on …, 2020 - proceedings.mlr.press

We consider off-policy evaluation (OPE), which evaluates the performance of a new policy
from observed data collected from previous experiments, without requiring the execution of …

被引用次数：43 相关文章所有 5 个版本

[PDF] arxiv.org

When is off-policy evaluation useful? a data-centric perspective

H Sun, AJ Chan, N Seedat, A Hüyük… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the value of a hypothetical target policy with only a logged dataset is important
but challenging. On the one hand, it brings opportunities for safe policy improvement under …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

Off-policy evaluation with deficient support using side information

N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc

Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …

被引用次数：8 相关文章所有 9 个版本

[PDF] arxiv.org

Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds

Y Feng, Z Tang, N Zhang, Q Liu - arXiv preprint arXiv:2103.05741, 2021 - arxiv.org

Off-policy evaluation (OPE) is the task of estimating the expected reward of a given policy
based on offline data previously collected under different policies. Therefore, OPE is a key …

被引用次数：12 相关文章所有 3 个版本

[PDF] aaai.org

Policy-adaptive estimator selection for off-policy evaluation

T Udagawa, H Kiyohara, Y Narita, Y Saito… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …

被引用次数：15 相关文章所有 5 个版本

[PDF] mlr.press

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Y Liu, PL Bacon, E Brunskill - International Conference on …, 2020 - proceedings.mlr.press

Off-policy policy estimators that use importance sampling (IS) can suffer from high variance
in long-horizon domains, and there has been particular excitement over new IS methods that …

被引用次数：43 相关文章所有 7 个版本

[PDF] arxiv.org

Distributional shift-aware off-policy interval estimation: A unified error quantification framework

W Zhou, Y Li, R Zhu, A Qu - arXiv preprint arXiv:2309.13278, 2023 - arxiv.org

We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …

被引用次数：2 相关文章所有 2 个版本

Behaviour policy estimation in off-policy policy evaluation: Calibration matters

Deeply-debiased off-policy interval estimation

An instrumental variable approach to confounded off-policy evaluation

Accountable off-policy evaluation with kernel bellman statistics

When is off-policy evaluation useful? a data-centric perspective

Off-policy evaluation with deficient support using side information

Non-asymptotic confidence intervals of off-policy evaluation: Primal and dual bounds

Policy-adaptive estimator selection for off-policy evaluation

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

Distributional shift-aware off-policy interval estimation: A unified error quantification framework

相关搜索

高级搜索

引用