A novel evaluation methodology for assessing off-policy learning methods in contextual bandits

N Hassanpour, R Greiner - … on Artificial Intelligence, Canadian AI 2018 …, 2018 - Springer
We propose a novel evaluation methodology for assessing off-policy learning methods in
contextual bandits. In particular, we provide a way to use data from any given Randomized …

Policy evaluation with latent confounders via optimal balance

A Bennett, N Kallus - Advances in neural information …, 2019 - proceedings.neurips.cc
Evaluating novel contextual bandit policies using logged data is crucial in applications
where exploration is costly, such as medicine. But it usually relies on the assumption of no …

Anytime-valid off-policy inference for contextual bandits

I Waudby-Smith, L Wu, A Ramdas… - ACM/JMS Journal of …, 2024 - dl.acm.org
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …

Local metric learning for off-policy evaluation in contextual bandits with continuous actions

H Lee, J Lee, Y Choi, W Jeon, BJ Lee… - Advances in …, 2022 - proceedings.neurips.cc
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …

Marginal density ratio for off-policy evaluation in contextual bandits

MF Taufiq, A Doucet, R Cornish… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new
policies using existing data without costly experimentation. However, current OPE methods …

Empirical likelihood for contextual bandits

N Karampatziakis, J Langford… - Advances in Neural …, 2020 - proceedings.neurips.cc
We propose an estimator and confidence interval for computing the value of a policy from off-
policy data in the contextual bandit setting. To this end we apply empirical likelihood …

Conformal off-policy prediction in contextual bandits

MF Taufiq, JF Ton, R Cornish… - Advances in Neural …, 2022 - proceedings.neurips.cc
Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …

Sample-efficient nonstationary policy evaluation for contextual bandits

M Dudík, D Erhan, J Langford, L Li - arXiv preprint arXiv:1210.4862, 2012 - arxiv.org
We present and prove properties of a new offline policy evaluator for an exploration learning
setting which is superior to previous evaluators. In particular, it simultaneously and correctly …

Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support

H Tran-The, S Gupta, T Nguyen-Tang, S Rana… - arXiv preprint arXiv …, 2021 - arxiv.org
We address policy learning with logged data in contextual bandits. Current offline-policy
learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring …

Post-contextual-bandit inference

A Bibaut, M Dimakopoulou, N Kallus… - Advances in neural …, 2021 - proceedings.neurips.cc
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …