查看文章

mlr.press 中的 [PDF]

Deterministic policy gradient algorithms

作者

David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, Martin Riedmiller

发表日期

2014/6/21

研讨会论文

ICML

简介

In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. Deterministic policy gradient algorithms outperformed their stochastic counterparts in several benchmark problems, particularly in high-dimensional action spaces.

引用总数

被引用次数：5154

201520162017201820192020202120222023202423 45 153 346 591 721 809 891 1009 538

学术搜索中的文章

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris, D Wierstra… - International conference on machine learning, 2014

被引用次数：5154 相关文章所有 32 个版本