Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

On well-posedness and minimax optimal rates of nonparametric q-function estimation in off-policy evaluation

X Chen, Z Qi - International Conference on Machine …, 2022 - proceedings.mlr.press
We study the off-policy evaluation (OPE) problem in an infinite-horizon Markov decision
process with continuous states and actions. We recast the $ Q $-function estimation into a …

Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks

X Ji, M Chen, M Wang, T Zhao - arXiv preprint arXiv:2206.02887, 2022 - arxiv.org
We consider the off-policy evaluation problem of reinforcement learning using deep
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory

R Zhang, X Zhang, C Ni… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Optimal and instance-dependent guarantees for Markovian linear stochastic approximation

W Mou, A Pananjady, MJ Wainwright… - arXiv preprint arXiv …, 2021 - arxiv.org
We study stochastic approximation procedures for approximately solving a $ d $-
dimensional linear fixed point equation based on observing a trajectory of length $ n $ from …

Kernel-based off-policy estimation without overlap: Instance optimality beyond semiparametric efficiency

W Mou, P Ding, MJ Wainwright, PL Bartlett - arXiv preprint arXiv …, 2023 - arxiv.org
We study optimal procedures for estimating a linear functional based on observational data.
In many problems of this kind, a widely used assumption is strict overlap, ie, uniform …

Sample complexity of offline reinforcement learning with deep ReLU networks

T Nguyen-Tang, S Gupta, H Tran-The… - arXiv preprint arXiv …, 2021 - arxiv.org
Offline reinforcement learning (RL) leverages previously collected data for policy
optimization without any further active exploration. Despite the recent interest in this …

A finite-sample analysis of multi-step temporal difference estimates

Y Duan, MJ Wainwright - Learning for Dynamics and Control …, 2023 - proceedings.mlr.press
We consider the problem of estimating the value function of an infinite-horizon $\gamma $-
discounted Markov reward process (MRP). We establish non-asymptotic guarantees for a …