查看文章

arxiv.org 中的 [PDF]

Maximum a posteriori policy optimisation

作者

Abbas Abdolmaleki, Jost Tobias Springenberg, Yuval Tassa, Remi Munos, Nicolas Heess, Martin Riedmiller

发表日期

2018/6/14

期刊

arXiv preprint arXiv:1806.06920

简介

We introduce a new algorithm for reinforcement learning called Maximum aposteriori Policy Optimisation (MPO) based on coordinate ascent on a relative entropy objective. We show that several existing methods can directly be related to our derivation. We develop two off-policy algorithms and demonstrate that they are competitive with the state-of-the-art in deep reinforcement learning. In particular, for continuous control, our method outperforms existing methods with respect to sample efficiency, premature convergence and robustness to hyperparameter settings while achieving similar or better final performance.

引用总数

被引用次数：514

201820192020202120222023202415 49 75 100 100 97 76

学术搜索中的文章

Maximum a posteriori policy optimisation

A Abdolmaleki, JT Springenberg, Y Tassa, R Munos… - arXiv preprint arXiv:1806.06920, 2018

被引用次数：514 相关文章所有 4 个版本