Estimation and Inference in Distributional Reinforcement Learning

L Zhang, Y Peng, J Liang, W Yang, Z Zhang - arXiv preprint arXiv …, 2023 - arxiv.org
arXiv preprint arXiv:2309.17262, 2023arxiv.org
In this paper, we study distributional reinforcement learning from the perspective of statistical
efficiency. We investigate distributional policy evaluation, aiming to estimate the complete
distribution of the random return (denoted $\eta^\pi $) attained by a given policy $\pi $. We
use the certainty-equivalence method to construct our estimator $\hat\eta^\pi $, given a
generative model is available. We show that in this circumstance we need a dataset of size
$\widetilde O\left (\frac {|\mathcal {S}||\mathcal {A}|}{\epsilon^{2p}(1-\gamma)^{2p+ 2}}\right) …
In this paper, we study distributional reinforcement learning from the perspective of statistical efficiency. We investigate distributional policy evaluation, aiming to estimate the complete distribution of the random return (denoted ) attained by a given policy . We use the certainty-equivalence method to construct our estimator , given a generative model is available. We show that in this circumstance we need a dataset of size to guarantee a -Wasserstein metric between and is less than with high probability. This implies the distributional policy evaluation problem can be solved with sample efficiency. Also, we show that under different mild assumptions a dataset of size suffices to ensure the Kolmogorov metric and total variation metric between and is below with high probability. Furthermore, we investigate the asymptotic behavior of . We demonstrate that the ``empirical process'' converges weakly to a Gaussian process in the space of bounded functionals on Lipschitz function class , also in the space of bounded functionals on indicator function class and bounded measurable function class when some mild conditions hold. Our findings give rise to a unified approach to statistical inference of a wide class of statistical functionals of .
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果