Approximation benefits of policy gradient methods with aggregated states- 学术资源搜索

Approximation benefits of policy gradient methods with aggregated states

D Russo - Management Science, 2023 - pubsonline.informs.org

Management Science, 2023•pubsonline.informs.org

Folklore suggests that policy gradient can be more robust to misspecification than its relative, approximate policy iteration. This paper studies the case of state-aggregated representations, in which the state space is partitioned and either the policy or value function approximation is held constant over partitions. This paper shows a policy gradient method converges to a policy whose regret per period is bounded by ϵ, the largest difference between two elements of the state-action value function belonging to a common partition. With the same representation, both approximate policy iteration and approximate value iteration can produce policies whose per-period regret scales as , where γ is a discount factor. Faced with inherent approximation error, methods that locally optimize the true decision objective can be far more robust.

This paper was accepted by Hamid Nazerzadeh, data science.

Supplemental Material: Data are available at https://doi.org/10.1287/mnsc.2023.4788.

INFORMS

展开收起

被引用次数：12 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果