A novel evaluation methodology for assessing off-policy learning methods in contextual bandits
N Hassanpour, R Greiner - … on Artificial Intelligence, Canadian AI 2018 …, 2018 - Springer
We propose a novel evaluation methodology for assessing off-policy learning methods in
contextual bandits. In particular, we provide a way to use data from any given Randomized …
contextual bandits. In particular, we provide a way to use data from any given Randomized …
Policy evaluation with latent confounders via optimal balance
Evaluating novel contextual bandit policies using logged data is crucial in applications
where exploration is costly, such as medicine. But it usually relies on the assumption of no …
where exploration is costly, such as medicine. But it usually relies on the assumption of no …
Anytime-valid off-policy inference for contextual bandits
Contextual bandit algorithms are ubiquitous tools for active sequential experimentation in
healthcare and the tech industry. They involve online learning algorithms that adaptively …
healthcare and the tech industry. They involve online learning algorithms that adaptively …
Local metric learning for off-policy evaluation in contextual bandits with continuous actions
We consider local kernel metric learning for off-policy evaluation (OPE) of deterministic
policies in contextual bandits with continuous action spaces. Our work is motivated by …
policies in contextual bandits with continuous action spaces. Our work is motivated by …
Marginal density ratio for off-policy evaluation in contextual bandits
Abstract Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new
policies using existing data without costly experimentation. However, current OPE methods …
policies using existing data without costly experimentation. However, current OPE methods …
Empirical likelihood for contextual bandits
N Karampatziakis, J Langford… - Advances in Neural …, 2020 - proceedings.neurips.cc
We propose an estimator and confidence interval for computing the value of a policy from off-
policy data in the contextual bandit setting. To this end we apply empirical likelihood …
policy data in the contextual bandit setting. To this end we apply empirical likelihood …
Conformal off-policy prediction in contextual bandits
Most off-policy evaluation methods for contextual bandits have focused on the expected
outcome of a policy, which is estimated via methods that at best provide only asymptotic …
outcome of a policy, which is estimated via methods that at best provide only asymptotic …
Sample-efficient nonstationary policy evaluation for contextual bandits
We present and prove properties of a new offline policy evaluator for an exploration learning
setting which is superior to previous evaluators. In particular, it simultaneously and correctly …
setting which is superior to previous evaluators. In particular, it simultaneously and correctly …
Combining Online Learning and Offline Learning for Contextual Bandits with Deficient Support
We address policy learning with logged data in contextual bandits. Current offline-policy
learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring …
learning algorithms are mostly based on inverse propensity score (IPS) weighting requiring …
Post-contextual-bandit inference
Contextual bandit algorithms are increasingly replacing non-adaptive A/B tests in e-
commerce, healthcare, and policymaking because they can both improve outcomes for …
commerce, healthcare, and policymaking because they can both improve outcomes for …
相关搜索
- contextual bandits off policy
- contextual bandits evaluation methodology
- contextual bandits offline learning
- confounding bias contextual bandit
- missing observations contextual bandit
- contextual bandits continuous actions
- contextual bandits density ratio
- contextual bandits empirical likelihood
- latent confounders policy evaluation
- optimal balance policy evaluation
- contextual bandits online learning