Why should i trust you, bellman? the bellman error is a poor replacement for value error

S Fujimoto, D Meger, D Precup, O Nachum… - arXiv preprint arXiv …, 2022 - arxiv.org
In this work, we study the use of the Bellman equation as a surrogate objective for value
prediction accuracy. While the Bellman equation is uniquely solved by the true value …

Risk-aware transfer in reinforcement learning using successor features

M Gimelfarb, A Barreto, S Sanner… - Advances in Neural …, 2021 - proceedings.neurips.cc
Sample efficiency and risk-awareness are central to the development of practical
reinforcement learning (RL) for complex decision-making. The former can be addressed by …

A note on optimization formulations of Markov decision processes

L Ying, Y Zhu - arXiv preprint arXiv:2012.09417, 2020 - arxiv.org
This note summarizes the optimization formulations used in the study of Markov decision
processes. We consider both the discounted and undiscounted processes under the …

PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation

Y Zhu - arXiv preprint arXiv:2405.12535, 2024 - arxiv.org
In this paper, we address the problem of continuous-time reinforcement learning in
scenarios where the dynamics follow a stochastic differential equation. When the underlying …

Value estimation with finite data

S Fujimoto - 2024 - escholarship.mcgill.ca
This thesis investigates the intersection of reinforcement learning (RL) with function
approximation and limited data to develop practical, broadly applicable algorithms. Our …

Data-Driven Sequential Decision Making by Understanding and Adopting Rational Behavior

KH Kim - 2023 - search.proquest.com
A remarkable feature of an intelligent agent is the ability to make sequences of smart
decisions that are executed in coordination to reach goals. As can be seen by watching …

Applications of Fokker Planck Equations in Machine Learning Algorithms

Y Zhu - Young Researchers Conference, 2021 - Springer
As the continuous limit of the gradient-based optimization algorithms, Fokker Planck (FP)
equation can provide a qualitative description of the algorithm's behavior and give principled …

Why Should I Trust You, Bellman? Evaluating the Bellman Objective with Off-Policy Data

In this work, we analyze the effectiveness of the Bellman equation as a proxy objective for
value prediction accuracy in off-policy evaluation. While the Bellman equation is uniquely …