A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
[HTML][HTML] Batch policy learning in average reward markov decision processes
We consider the batch (off-line) policy learning problem in the infinite horizon Markov
Decision Process. Motivated by mobile health applications, we focus on learning a policy …
Decision Process. Motivated by mobile health applications, we focus on learning a policy …
Statistical inference of the value function for reinforcement learning in infinite-horizon settings
Reinforcement learning is a general technique that allows an agent to learn an optimal
policy and interact with an environment in sequential decision-making problems. The …
policy and interact with an environment in sequential decision-making problems. The …
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …
(POMDPs), where the evaluation policy depends only on observable variables and the …
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement
learning using function approximation for marginal importance weights and $ q $-functions …
learning using function approximation for marginal importance weights and $ q $-functions …
On well-posedness and minimax optimal rates of nonparametric q-function estimation in off-policy evaluation
X Chen, Z Qi - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision
process with continuous states and actions. We recast the $ Q $-function estimation into a …
process with continuous states and actions. We recast the $ Q $-function estimation into a …
Off-policy confidence interval estimation with confounded markov decision process
This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
An instrumental variable approach to confounded off-policy evaluation
Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …
collected observational data generated by a potentially different behavior policy. In many …
Distributional shift-aware off-policy interval estimation: A unified error quantification framework
We study high-confidence off-policy evaluation in the context of infinite-horizon Markov
decision processes, where the objective is to establish a confidence interval (CI) for the …
decision processes, where the objective is to establish a confidence interval (CI) for the …
Hallucinated adversarial control for conservative offline policy evaluation
We study the problem of conservative off-policy evaluation (COPE) where given an offline
dataset of environment interactions, collected by other agents, we seek to obtain a (tight) …
dataset of environment interactions, collected by other agents, we seek to obtain a (tight) …