Borrowing from the future: An attempt to address double sampling

S Fujimoto, D Meger, D Precup, O Nachum… - arXiv preprint arXiv …, 2022 - arxiv.org

In this work, we study the use of the Bellman equation as a surrogate objective for value
prediction accuracy. While the Bellman equation is uniquely solved by the true value …

被引用次数：36 相关文章所有 3 个版本

[PDF] neurips.cc

Risk-aware transfer in reinforcement learning using successor features

M Gimelfarb, A Barreto, S Sanner… - Advances in Neural …, 2021 - proceedings.neurips.cc

Sample efficiency and risk-awareness are central to the development of practical
reinforcement learning (RL) for complex decision-making. The former can be addressed by …

被引用次数：24 相关文章所有 10 个版本

[PDF] arxiv.org

A note on optimization formulations of Markov decision processes

L Ying, Y Zhu - arXiv preprint arXiv:2012.09417, 2020 - arxiv.org

This note summarizes the optimization formulations used in the study of Markov decision
processes. We consider both the discounted and undiscounted processes under the …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation

Y Zhu - arXiv preprint arXiv:2405.12535, 2024 - arxiv.org

In this paper, we address the problem of continuous-time reinforcement learning in
scenarios where the dynamics follow a stochastic differential equation. When the underlying …

被引用次数：1 相关文章所有 2 个版本

[PDF] mcgill.ca