- 学术资源搜索

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

被引用次数：58 相关文章所有 2 个版本

[PDF] mlr.press

When is realizability sufficient for off-policy reinforcement learning?

A Zanette - International Conference on Machine Learning, 2023 - proceedings.mlr.press

Understanding when reinforcement learning algorithms can make successful off-policy
predictions—and when the may fail to do so–remains an open problem. Typically, model …

被引用次数：16 相关文章所有 7 个版本

[PDF] neurips.cc

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

被引用次数：18 相关文章所有 8 个版本

[PDF] mlr.press

Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory

R Zhang, X Zhang, C Ni… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Off-Policy Evaluation (OPE) serves as one of the cornerstones in Reinforcement
Learning (RL). Fitted Q Evaluation (FQE) with various function approximators, especially …

被引用次数：19 相关文章所有 5 个版本

[PDF] neurips.cc

Loss dynamics of temporal difference reinforcement learning

B Bordelon, P Masset, H Kuo… - Advances in Neural …, 2024 - proceedings.neurips.cc

Reinforcement learning has been successful across several applications in which agents
have to learn to act in environments with sparse feedback. However, despite this empirical …

被引用次数：4 相关文章所有 5 个版本

[PDF] neurips.cc

Beyond the return: Off-policy function estimation under user-specified error-measuring distributions

A Huang, N Jiang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Off-policy evaluation often refers to two related tasks: estimating the expected return of a
policy and estimating its value function (or other functions of interest, such as density ratios) …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Robust offline policy evaluation and optimization with heavy-tailed rewards

J Zhu, R Wan, Z Qi, S Luo, C Shi - arXiv preprint arXiv:2310.18715, 2023 - arxiv.org

This paper endeavors to augment the robustness of offline reinforcement learning (RL) in
scenarios laden with heavy-tailed rewards, a prevalent circumstance in real-world …

被引用次数：2 相关文章所有 3 个版本

[PDF] tandfonline.com

Off-policy evaluation in doubly inhomogeneous environments

Z Bian, C Shi, Z Qi, L Wang - Journal of the American Statistical …, 2024 - Taylor & Francis

This work aims to study off-policy evaluation (OPE) under scenarios where two key
reinforcement learning (RL) assumptions–temporal stationarity and individual homogeneity …

被引用次数：3 相关文章所有 2 个版本

[PDF] mlr.press

The optimal approximation factors in misspecified off-policy value function estimation

P Amortila, N Jiang… - … Conference on Machine …, 2023 - proceedings.mlr.press

Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative
blow-up factors with respect to the misspecification error of function approximation. Yet, the …

被引用次数：1 相关文章所有 8 个版本

[PDF] arxiv.org

On the Curses of Future and History in Future-dependent Value Functions for Off-policy Evaluation

Y Zhang, N Jiang - arXiv preprint arXiv:2402.14703, 2024 - arxiv.org

We study off-policy evaluation (OPE) in partially observable environments with complex
observations, with the goal of developing estimators whose guarantee avoids exponential …

被引用次数：2 相关文章所有 2 个版本