A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Statistical inference of the value function for reinforcement learning in infinite-horizon settings
Reinforcement learning is a general technique that allows an agent to learn an optimal
policy and interact with an environment in sequential decision-making problems. The …
policy and interact with an environment in sequential decision-making problems. The …
Optimal treatment regimes: a review and empirical comparison
Z Li, J Chen, E Laber, F Liu… - International Statistical …, 2023 - Wiley Online Library
A treatment regime is a sequence of decision rules, one per decision point, that maps
accumulated patient information to a recommended intervention. An optimal treatment …
accumulated patient information to a recommended intervention. An optimal treatment …
Deeply-debiased off-policy interval estimation
Off-policy evaluation learns a target policy's value with a historical dataset generated by a
different behavior policy. In addition to a point estimate, many applications would benefit …
different behavior policy. In addition to a point estimate, many applications would benefit …
Estimating and improving dynamic treatment regimes with a time-varying instrumental variable
Estimating dynamic treatment regimes (DTRs) from retrospective observational data is
challenging as some degree of unmeasured confounding is often expected. In this work, we …
challenging as some degree of unmeasured confounding is often expected. In this work, we …
Transfer learning for contextual multi-armed bandits
Transfer learning for contextual multi-armed bandits Page 1 The Annals of Statistics 2024,
Vol. 52, No. 1, 207–232 https://doi.org/10.1214/23-AOS2341 © Institute of Mathematical …
Vol. 52, No. 1, 207–232 https://doi.org/10.1214/23-AOS2341 © Institute of Mathematical …
A multi-agent reinforcement learning framework for off-policy evaluation in two-sided markets
The two-sided markets such as ride-sharing companies often involve a group of subjects
who are making sequential decisions across time and/or location. With the rapid …
who are making sequential decisions across time and/or location. With the rapid …
Deep jump learning for off-policy evaluation in continuous treatment settings
We consider off-policy evaluation (OPE) in continuous treatment settings, such as
personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new …
personalized dose-finding. In OPE, one aims to estimate the mean outcome under a new …
Statistically efficient advantage learning for offline reinforcement learning in infinite horizons
We consider reinforcement learning (RL) methods in offline domains without additional
online data collection, such as mobile health applications. Most of existing policy …
online data collection, such as mobile health applications. Most of existing policy …
Evaluating dynamic conditional quantile treatment effects with applications in ridesharing
Many modern tech companies, such as Google, Uber, and Didi, use online experiments
(also known as A/B testing) to evaluate new policies against existing ones. While most …
(also known as A/B testing) to evaluate new policies against existing ones. While most …