An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Off-policy evaluation for human feedback
Off-policy evaluation (OPE) is important for closing the gap between offline training and
evaluation of reinforcement learning (RL), by estimating performance and/or rank of target …
evaluation of reinforcement learning (RL), by estimating performance and/or rank of target …
Distributional shift-aware off-policy interval estimation: A unified error quantification framework
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …
decision processes, where the objective is to establish a confidence interval (CI) for the …
Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks
We consider the off-policy evaluation problem of reinforcement learning using deep
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …
Policy-adaptive estimator selection for off-policy evaluation
Off-policy evaluation (OPE) aims to accurately evaluate the performance of counterfactual
policies using only offline logged data. Although many estimators have been developed …
policies using only offline logged data. Although many estimators have been developed …
Optimal treatment allocation for efficient policy evaluation in sequential decision making
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …
newly developed products against standard baselines. This paper studies optimal designs …
Provable benefits of policy learning from human preferences in contextual bandit problems
A crucial task in decision-making problems is reward engineering. It is common in practice
that no obvious choice of reward function exists. Thus, a popular approach is to introduce …
that no obvious choice of reward function exists. Thus, a popular approach is to introduce …
A reinforcement learning framework for dynamic mediation analysis
Mediation analysis learns the causal effect transmitted via mediator variables between
treatments and outcomes, and receives increasing attention in various scientific domains to …
treatments and outcomes, and receives increasing attention in various scientific domains to …
Development and validation of a reinforcement learning model for ventilation control during emergence from general anesthesia
Ventilation should be assisted without asynchrony or cardiorespiratory instability during
anesthesia emergence until sufficient spontaneous ventilation is recovered. In this …
anesthesia emergence until sufficient spontaneous ventilation is recovered. In this …
Did we personalize? assessing personalization by an online reinforcement learning algorithm using resampling
There is a growing interest in using reinforcement learning (RL) to personalize sequences of
treatments in digital health to support users in adopting healthier behaviors. Such sequential …
treatments in digital health to support users in adopting healthier behaviors. Such sequential …