Why should i trust you, bellman? the bellman error is a poor replacement for value error
In this work, we study the use of the Bellman equation as a surrogate objective for value
prediction accuracy. While the Bellman equation is uniquely solved by the true value …
prediction accuracy. While the Bellman equation is uniquely solved by the true value …
Risk-aware transfer in reinforcement learning using successor features
Sample efficiency and risk-awareness are central to the development of practical
reinforcement learning (RL) for complex decision-making. The former can be addressed by …
reinforcement learning (RL) for complex decision-making. The former can be addressed by …
A note on optimization formulations of Markov decision processes
This note summarizes the optimization formulations used in the study of Markov decision
processes. We consider both the discounted and undiscounted processes under the …
processes. We consider both the discounted and undiscounted processes under the …
PhiBE: A PDE-based Bellman Equation for Continuous Time Policy Evaluation
Y Zhu - arXiv preprint arXiv:2405.12535, 2024 - arxiv.org
In this paper, we address the problem of continuous-time reinforcement learning in
scenarios where the dynamics follow a stochastic differential equation. When the underlying …
scenarios where the dynamics follow a stochastic differential equation. When the underlying …
Value estimation with finite data
S Fujimoto - 2024 - escholarship.mcgill.ca
This thesis investigates the intersection of reinforcement learning (RL) with function
approximation and limited data to develop practical, broadly applicable algorithms. Our …
approximation and limited data to develop practical, broadly applicable algorithms. Our …
Data-Driven Sequential Decision Making by Understanding and Adopting Rational Behavior
KH Kim - 2023 - search.proquest.com
A remarkable feature of an intelligent agent is the ability to make sequences of smart
decisions that are executed in coordination to reach goals. As can be seen by watching …
decisions that are executed in coordination to reach goals. As can be seen by watching …
Applications of Fokker Planck Equations in Machine Learning Algorithms
Y Zhu - Young Researchers Conference, 2021 - Springer
As the continuous limit of the gradient-based optimization algorithms, Fokker Planck (FP)
equation can provide a qualitative description of the algorithm's behavior and give principled …
equation can provide a qualitative description of the algorithm's behavior and give principled …
Why Should I Trust You, Bellman? Evaluating the Bellman Objective with Off-Policy Data
In this work, we analyze the effectiveness of the Bellman equation as a proxy objective for
value prediction accuracy in off-policy evaluation. While the Bellman equation is uniquely …
value prediction accuracy in off-policy evaluation. While the Bellman equation is uniquely …