PROST: Probabilistic planning based on UCT
T Keller, P Eyerich - Proceedings of the International Conference on …, 2012 - ojs.aaai.org
T Keller, P Eyerich
Proceedings of the International Conference on Automated Planning and …, 2012•ojs.aaai.orgWe present PROST, a probabilistic planning system that is based on the UCT algorithm by
Kocsis and Szepesvari (2006), which has been applied successfully to many areas of
planning and acting under uncertainty. The objective of this paper is to show the application
of UCT to domain-independent probabilistic planning, an area it had not been applied to
before. We furthermore present several enhance-ments to the algorithm, including a method
that is able to drastically reduce the branching factor by identifying super-fluous actions. We …
Kocsis and Szepesvari (2006), which has been applied successfully to many areas of
planning and acting under uncertainty. The objective of this paper is to show the application
of UCT to domain-independent probabilistic planning, an area it had not been applied to
before. We furthermore present several enhance-ments to the algorithm, including a method
that is able to drastically reduce the branching factor by identifying super-fluous actions. We …
Abstract
We present PROST, a probabilistic planning system that is based on the UCT algorithm by Kocsis and Szepesvari (2006), which has been applied successfully to many areas of planning and acting under uncertainty. The objective of this paper is to show the application of UCT to domain-independent probabilistic planning, an area it had not been applied to before. We furthermore present several enhance-ments to the algorithm, including a method that is able to drastically reduce the branching factor by identifying super-fluous actions. We show how search depth limitation leads to a more thoroughly investigated search space in parts that are influential on the quality of a policy, and present a sound and polynomially computable detection of reward locks, states that correspond to, eg, dead ends or goals. We describe a general Q-value initialization for unvisited nodes in the search tree that circumvents the initial random walks inher-ent to UCT, and leads to a faster convergence on average. We demonstrate the significant influence of the enhancements by providing a comparison on the IPPC benchmark domains.
ojs.aaai.org
以上显示的是最相近的搜索结果。 查看全部搜索结果