Optimising individual-treatment-effect using bandits

J Berrevoets, S Verboven, W Verbeke - arXiv preprint arXiv:1910.07265, 2019 - arxiv.org
Applying causal inference models in areas such as economics, healthcare and marketing
receives great interest from the machine learning community. In particular, estimating the …

[HTML][HTML] Treatment effect optimisation in dynamic environments

J Berrevoets, S Verboven, W Verbeke - Journal of Causal Inference, 2022 - degruyter.com
Applying causal methods to fields such as healthcare, marketing, and economics receives
increasing interest. In particular, optimising the individual-treatment-effect–often referred to …

About evaluation metrics for contextual uplift modeling

C Renaudin, M Martin - arXiv preprint arXiv:2107.00537, 2021 - arxiv.org
In this tech report we discuss the evaluation problem of contextual uplift modeling from the
causal inference point of view. More particularly, we instantiate the individual treatment …

Marginal density ratio for off-policy evaluation in contextual bandits

MF Taufiq, A Doucet, R Cornish… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Off-Policy Evaluation (OPE) in contextual bandits is crucial for assessing new
policies using existing data without costly experimentation. However, current OPE methods …

A large scale benchmark for individual treatment effect prediction and uplift modeling

E Diemert, A Betlei, C Renaudin, MR Amini… - arXiv preprint arXiv …, 2021 - arxiv.org
Individual Treatment Effect (ITE) prediction is an important area of research in machine
learning which aims at explaining and estimating the causal impact of an action at the …

Contextual multi-armed bandits for causal marketing

N Sawant, CB Namballa, N Sadagopan… - arXiv preprint arXiv …, 2018 - arxiv.org
This work explores the idea of a causal contextual multi-armed bandit approach to
automated marketing, where we estimate and optimize the causal (incremental) effects …

An experimental design for anytime-valid causal inference on multi-armed bandits

B Liang, I Bojinov - arXiv preprint arXiv:2311.05794, 2023 - arxiv.org
Typically, multi-armed bandit (MAB) experiments are analyzed at the end of the study and
thus require the analyst to specify a fixed sample size in advance. However, in many online …

Policy evaluation with latent confounders via optimal balance

A Bennett, N Kallus - Advances in neural information …, 2019 - proceedings.neurips.cc
Evaluating novel contextual bandit policies using logged data is crucial in applications
where exploration is costly, such as medicine. But it usually relies on the assumption of no …

Rarely-switching linear bandits: optimization of causal effects for the real world

B Lansdell, S Triantafillou, K Kording - arXiv preprint arXiv:1905.13121, 2019 - arxiv.org
Excessively changing policies in many real world scenarios is difficult, unethical, or
expensive. After all, doctor guidelines, tax codes, and price lists can only be reprinted so …

Off-policy evaluation and learning from logged bandit feedback: Error reduction via surrogate policy

Y Xie, B Liu, Q Liu, Z Wang, Y Zhou, J Peng - arXiv preprint arXiv …, 2018 - arxiv.org
When learning from a batch of logged bandit feedback, the discrepancy between the policy
to be learned and the off-policy training data imposes statistical and computational …