A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
A survey on causal reinforcement learning
While reinforcement learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …
making problems of many domains, it still faces key challenges of data inefficiency and the …
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …
(POMDPs), where the evaluation policy depends only on observable variables and the …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
Offline reinforcement learning with instrumental variables in confounded markov decision processes
We study the offline reinforcement learning (RL) in the face of unmeasured confounders.
Due to the lack of online interaction with the environment, offline RL is facing the following …
Due to the lack of online interaction with the environment, offline RL is facing the following …
Off-policy evaluation for episodic partially observable markov decision processes under non-parametric models
We study the problem of off-policy evaluation (OPE) for episodic Partially Observable
Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently …
Markov Decision Processes (POMDPs) with continuous states. Motivated by the recently …
Optimal treatment allocation for efficient policy evaluation in sequential decision making
A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …
newly developed products against standard baselines. This paper studies optimal designs …
Estimating and improving dynamic treatment regimes with a time-varying instrumental variable
Estimating dynamic treatment regimes (DTRs) from retrospective observational data is
challenging as some degree of unmeasured confounding is often expected. In this work, we …
challenging as some degree of unmeasured confounding is often expected. In this work, we …
A reinforcement learning framework for dynamic mediation analysis
Mediation analysis learns the causal effect transmitted via mediator variables between
treatments and outcomes, and receives increasing attention in various scientific domains to …
treatments and outcomes, and receives increasing attention in various scientific domains to …