Scale-free adaptive planning for deterministic dynamics & discounted rewards

P Bartlett, V Gabillon, J Healey… - … on Machine Learning, 2019 - proceedings.mlr.press
We address the problem of planning in an environment with deterministic dynamics and
stochastic discounted rewards under a limited numerical budget where the ranges of both
rewards and noise are unknown. We introduce PlaTypOOS, an adaptive, robust, and
efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP
requires a priori knowledge of the ranges of both rewards and noise, PlaTypOOS
dynamically adapts its behavior to both. This allows PlaTypOOS to be immune to two …

[PDF][PDF] Scale-free adaptive planning for deterministic dynamics & discounted rewards

PLBVG Jennifer, HM Valko - researchers.lille.inria.fr
We address the problem of planning in an environment with deterministic dynamics and
stochastic discounted rewards under a limited numerical budget where the ranges of both
rewards and noise are unknown. We introduce PlaTγPOOS, an adaptive, robust, and
efficient alternative to the OLOP (open-loop optimistic planning) algorithm. Whereas OLOP
requires a priori knowledge of the ranges of both rewards and noise, PlaTγPOOS
dynamically adapts its behavior to both. This allows PlaTγPOOS to be immune to two …
以上显示的是最相近的搜索结果。 查看全部搜索结果