Toward efficient gradient-based value estimation
A Sharifnassab, RS Sutton - International Conference on …, 2023 - proceedings.mlr.press
Gradient-based methods for value estimation in reinforcement learning have favorable
stability properties, but they are typically much slower than Temporal Difference (TD) …
stability properties, but they are typically much slower than Temporal Difference (TD) …