相关文章- 学术资源搜索

A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

N Hassanpour, R Greiner - … on Artificial Intelligence, Canadian AI 2018 …, 2018 - Springer

We propose a novel evaluation methodology for assessing off-policy learning methods in
contextual bandits. In particular, we provide a way to use data from any given Randomized …

被引用次数：4 相关文章所有 2 个版本

[PDF] neurips.cc

Policy evaluation with latent confounders via optimal balance

A Bennett, N Kallus - Advances in neural information …, 2019 - proceedings.neurips.cc

Evaluating novel contextual bandit policies using logged data is crucial in applications
where exploration is costly, such as medicine. But it usually relies on the assumption of no …

被引用次数：22 相关文章所有 9 个版本

[PDF] acm.org Full View

Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org

Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

H Lee, J Lee, Y Choi, W Jeon, BJ Lee… - Advances in …, 2022 - proceedings.neurips.cc

We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …

被引用次数：3 相关文章所有 8 个版本

[PDF] neurips.cc

Marginal density ratio for off-policy evaluation in contextual bandits

MF Taufiq, A Doucet, R Cornish… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new
policies using existing data without costly experimentation. However, current OPE methods …

被引用次数：3 相关文章所有 4 个版本

[PDF] neurips.cc

Empirical likelihood for contextual bandits

N Karampatziakis, J Langford… - Advances in Neural …, 2020 - proceedings.neurips.cc

We propose an estimator and confidence interval for computing the value of a policy from off-
policy data in the contextual bandit setting. To this end we apply empirical likelihood …

被引用次数：12 相关文章所有 7 个版本

[PDF] neurips.cc

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc

Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

被引用次数：14 相关文章所有 5 个版本

[PDF] arxiv.org

Sample-efficient nonstationary policy evaluation for contextual bandits

M Dudík, D Erhan, J Langford, L Li - arXiv preprint arXiv:1210.4862, 2012 - arxiv.org

We present and prove properties of a new offline policy evaluator for an exploration learning
setting which is superior to previous evaluators. In particular, it simultaneously and correctly …

被引用次数：36 相关文章所有 10 个版本

[PDF] arxiv.org

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

H Tran-The, S Gupta, T Nguyen-Tang, S Rana… - arXiv preprint arXiv …, 2021 - arxiv.org

We address policy learning with logged data in contextual bandits. Current offline-policy
learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring …

[PDF] neurips.cc

Post-contextual-bandit inference

A Bibaut, M Dimakopoulou, N Kallus… - Advances in neural …, 2021 - proceedings.neurips.cc

Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …

被引用次数：43 相关文章所有 14 个版本

A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

Policy evaluation with latent confounders via optimal balance

Anytime-valid off-policy inference for contextual bandits

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

Marginal density ratio for off-policy evaluation in contextual bandits

Empirical likelihood for contextual bandits

Conformal off-policy prediction in contextual bandits

Sample-efficient nonstationary policy evaluation for contextual bandits

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

Post-contextual-bandit inference

相关搜索

高级搜索

引用