A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …
a fixed dataset without active data collection. Based on the composition of the offline dataset …
Representation learning for online and offline rl in low-rank mdps
This work studies the question of Representation Learning in RL: how can we learn a
compact low-dimensional representation such that on top of the representation we can …
compact low-dimensional representation such that on top of the representation we can …
Pessimistic model-based offline reinforcement learning under partial coverage
We study model-based offline Reinforcement Learning with general function approximation
without a full coverage assumption on the offline data distribution. We present an algorithm …
without a full coverage assumption on the offline data distribution. We present an algorithm …
Provably efficient reinforcement learning in partially observable dynamical systems
Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …
approximation. We propose a new PO-bilinear framework, that is general enough to include …
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement
learning using function approximation for marginal importance weights and $ q $-functions …
learning using function approximation for marginal importance weights and $ q $-functions …
Provable benefits of representational transfer in reinforcement learning
We study the problem of representational transfer in RL, where an agent first pretrains in a
number of\emph {source tasks} to discover a shared representation, which is subsequently …
number of\emph {source tasks} to discover a shared representation, which is subsequently …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
Bootstrapping fitted q-evaluation for off-policy inference
Bootstrapping provides a flexible and effective approach for assessing the quality of batch
reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …
reinforcement learning, yet its theoretical properties are poorly understood. In this paper, we …
Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …