Deterministic policy gradient adaptive dynamic programming for model-free optimal control- 学术资源搜索

文章

学术资源搜索

Deterministic policy gradient adaptive dynamic programming for model-free optimal control

Y Zhang, B Zhao, D Liu - Neurocomputing, 2020 - Elsevier

Neurocomputing, 2020•Elsevier

Abstract

In this paper, a deterministic policy gradient adaptive dynamic programming (DPGADP) algorithm is proposed for solving model-free optimal control problems of discrete-time nonlinear systems. By using the measured data, the developed algorithm improves the control performance with the policy gradient method. The convergence of DPGADP algorithm is demonstrated by showing that the constructed Q-function sequence is monotonically non-increasing and converges to the optimal Q-function. An actor-critic neural network (NN) structure is established to implement the DPGADP algorithm. Experience replay and target network techniques from deep Q-learning are employed during the training process. The stability of the NN weight error dynamics is also analyzed. Finally, two simulation examples are presented to verify the effectiveness of the proposed method.

Elsevier

展开收起

以上显示的是最相近的搜索结果。查看全部搜索结果

Deterministic policy gradient adaptive dynamic programming for model-free optimal control

高级搜索

引用