A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Causal reinforcement learning: A survey
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …
under uncertainty. Despite many remarkable achievements in recent decades, applying …
Causal reinforcement learning using observational and interventional data
Learning efficiently a causal model of the environment is a key challenge of model-based
RL agents operating in POMDPs. We consider here a scenario where the learning agent …
RL agents operating in POMDPs. We consider here a scenario where the learning agent …
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …
(POMDPs), where the evaluation policy depends only on observable variables and the …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Off-policy confidence interval estimation with confounded markov decision process
This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes
In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …
or education, a general concern is that observed actions might be affected by unobserved …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes
We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …