Geometry and determinism of optimal stationary control in partially observable markov decision processes
It is well known that for any finite state Markov decision process (MDP) there is a
memoryless deterministic policy that maximizes the expected reward. For partially
observable Markov decision processes (POMDPs), optimal memoryless policies are
generally stochastic. We study the expected reward optimization problem over the set of
memoryless stochastic policies. We formulate this as a constrained linear optimization
problem and develop a corresponding geometric framework. We show that any POMDP has …
memoryless deterministic policy that maximizes the expected reward. For partially
observable Markov decision processes (POMDPs), optimal memoryless policies are
generally stochastic. We study the expected reward optimization problem over the set of
memoryless stochastic policies. We formulate this as a constrained linear optimization
problem and develop a corresponding geometric framework. We show that any POMDP has …
以上显示的是最相近的搜索结果。 查看全部搜索结果