Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

Optimising individual-treatment-effect using bandits

J Berrevoets, S Verboven, W Verbeke - arXiv preprint arXiv:1910.07265, 2019 - arxiv.org
Applying causal inference models in areas such as economics, healthcare and marketing
receives great interest from the machine learning community. In particular, estimating the …

A Bayesian Approach to Online Learning for Contextual Restless Bandits with Applications to Public Health

B Liang, L Xu, A Taneja, M Tambe, L Janson - arXiv preprint arXiv …, 2024 - arxiv.org
Restless multi-armed bandits (RMABs) are used to model sequential resource allocation in
public health intervention programs. In these settings, the underlying transition dynamics are …

Contextual Fixed-Budget Best Arm Identification: Adaptive Experimental Design with Policy Learning

M Kato, K Okumura, T Ishihara, T Kitagawa - arXiv preprint arXiv …, 2024 - arxiv.org
Individualized treatment recommendation is a crucial task in evidence-based decision-
making. In this study, we formulate this task as a fixed-budget best arm identification (BAI) …

Beyond reward: Offline preference-guided policy optimization

Y Kang, D Shi, J Liu, L He, D Wang - arXiv preprint arXiv:2305.16217, 2023 - arxiv.org
This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a
variant of conventional reinforcement learning that dispenses with the need for online …

[引用][C] Harnessing infinite-horizon off-policy evaluation: Double robustness via duality

Z Tang, Y Feng, L Li, D Zhou, Q Liu - ICLR 2020, 2020

Fairness Evaluation for Uplift Modeling in the Absence of Ground Truth

S Kadioglu, F Michalsky - arXiv preprint arXiv:2403.12069, 2024 - arxiv.org
The acceleration in the adoption of AI-based automated decision-making systems poses a
challenge for evaluating the fairness of algorithmic decisions, especially in the absence of …

Semi-parametric efficient policy learning with continuous actions

V Chernozhukov, M Demirer… - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider off-policy evaluation and optimization with continuous action spaces. We focus
on observational data where the data collection policy is unknown and needs to be …

Equitable Restless Multi-Armed Bandits: A General Framework Inspired By Digital Health

JA Killian, M Jain, Y Jia, J Amar, E Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Restless multi-armed bandits (RMABs) are a popular framework for algorithmic decision
making in sequential settings with limited resources. RMABs are increasingly being used for …

Robust fitted-q-evaluation and iteration under sequentially exogenous unobserved confounders

D Bruns-Smith, A Zhou - arXiv preprint arXiv:2302.00662, 2023 - arxiv.org
Offline reinforcement learning is important in domains such as medicine, economics, and e-
commerce where online experimentation is costly, dangerous or unethical, and where the …