Deterministic policy gradient adaptive dynamic programming for model-free optimal control

Y Zhang, B Zhao, D Liu - Neurocomputing, 2020 - Elsevier
Y Zhang, B Zhao, D Liu
Neurocomputing, 2020Elsevier
In this paper, a deterministic policy gradient adaptive dynamic programming (DPGADP)
algorithm is proposed for solving model-free optimal control problems of discrete-time
nonlinear systems. By using the measured data, the developed algorithm improves the
control performance with the policy gradient method. The convergence of DPGADP
algorithm is demonstrated by showing that the constructed Q-function sequence is
monotonically non-increasing and converges to the optimal Q-function. An actor-critic neural …
Abstract
In this paper, a deterministic policy gradient adaptive dynamic programming (DPGADP) algorithm is proposed for solving model-free optimal control problems of discrete-time nonlinear systems. By using the measured data, the developed algorithm improves the control performance with the policy gradient method. The convergence of DPGADP algorithm is demonstrated by showing that the constructed Q-function sequence is monotonically non-increasing and converges to the optimal Q-function. An actor-critic neural network (NN) structure is established to implement the DPGADP algorithm. Experience replay and target network techniques from deep Q-learning are employed during the training process. The stability of the NN weight error dynamics is also analyzed. Finally, two simulation examples are presented to verify the effectiveness of the proposed method.
Elsevier
以上显示的是最相近的搜索结果。 查看全部搜索结果