Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
On well-posedness and minimax optimal rates of nonparametric q-function estimation in off-policy evaluation
X Chen, Z Qi - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision
process with continuous states and actions. We recast the $ Q $-function estimation into a …
process with continuous states and actions. We recast the $ Q $-function estimation into a …
Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks
We consider the off-policy evaluation problem of reinforcement learning using deep
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …
On instance-dependent bounds for offline reinforcement learning with linear function approximation
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …
been studied extensively recently. Much of the prior work has yielded instance-independent …
Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …
Improved Bayesian regret bounds for Thompson sampling in reinforcement learning
A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …
reinforcement learning in a multitude of settings. We present a refined analysis of the …
Optimal and instance-dependent guarantees for Markovian linear stochastic approximation
We study stochastic approximation procedures for approximately solving a $ d $-
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …
Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency
We study optimal procedures for estimating a linear functional based on observational data.
In many problems of this kind, a widely used assumption is strict overlap, ie, uniform …
In many problems of this kind, a widely used assumption is strict overlap, ie, uniform …
Sample complexity of offline reinforcement learning with deep ReLU networks
Offline reinforcement learning (RL) leverages previously collected data for policy
optimization without any further active exploration. Despite the recent interest in this …
optimization without any further active exploration. Despite the recent interest in this …
A finite-sample analysis of multi-step temporal difference estimates
Y Duan, MJ Wainwright - Learning for Dynamics and Control …, 2023 - proceedings.mlr.press
We consider the problem of estimating the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …