Minimax value interval for off-policy evaluation and policy optimization
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …
marginalized importance weights. Despite that they hold promises of overcoming the …
Optimising individual-treatment-effect using bandits
Applying causal inference models in areas such as economics, healthcare and marketing
receives great interest from the machine learning community. In particular, estimating the …
receives great interest from the machine learning community. In particular, estimating the …
A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health
Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in
public health intervention programs. In these settings, the underlying transition dynamics are …
public health intervention programs. In these settings, the underlying transition dynamics are …
Contextual Fixed-Budget Best Arm Identification: Adaptive Experimental Design with Policy Learning
Individualized treatment recommendation is a crucial task in evidence-based decision-
making. In this study, we formulate this task as a fixed-budget best arm identification (BAI) …
making. In this study, we formulate this task as a fixed-budget best arm identification (BAI) …
Beyond reward: Offline preference-guided policy optimization
This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a
variant of conventional reinforcement learning that dispenses with the need for online …
variant of conventional reinforcement learning that dispenses with the need for online …
Fairness Evaluation for Uplift Modeling in the Absence of Ground Truth
S Kadioglu, F Michalsky - arXiv preprint arXiv:2403.12069, 2024 - arxiv.org
The acceleration in the adoption of AI-based automated decision-making systems poses a
challenge for evaluating the fairness of algorithmic decisions, especially in the absence of …
challenge for evaluating the fairness of algorithmic decisions, especially in the absence of …
Semi-parametric efficient policy learning with continuous actions
V Chernozhukov, M Demirer… - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …
on observational data where the data collection policy is unknown and needs to be …
Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision
making in sequential settings with limited resources. RMABs are increasingly being used for …
making in sequential settings with limited resources. RMABs are increasingly being used for …
Robust fitted-q-evaluation and iteration under sequentially exogenous unobserved confounders
D Bruns-Smith, A Zhou - arXiv preprint arXiv:2302.00662, 2023 - arxiv.org
Offline reinforcement learning is important in domains such as medicine, economics, and e-
commerce where online experimentation is costly, dangerous or unethical, and where the …
commerce where online experimentation is costly, dangerous or unethical, and where the …