Mixed policy gradient

Y Guan, J Duan, SE Li, J Li, J Chen… - arXiv preprint arXiv …, 2021 - arxiv.org
… This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical
data and the transition model in policy gradient (PG) to accelerate convergence without …

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc
… family of policy gradient methods that interpolate between on-policy and off-policy learning.
… our interpolated policy gradient method is the use of control variates to mix likelihood ratio …

A policy gradient algorithm for learning to learn in multiagent reinforcement learning

DK Kim, M Liu, MD Riemer, C Sun… - International …, 2021 - proceedings.mlr.press
gradient updates to consider both an agent’s own non-stationary policy dynamics and the
non-stationary policy … full spectrum of mixed incentive, competitive, and cooperative domains. …

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

S Li, Y Wu, X Cui, H Dong, F Fang, S Russell - Proceedings of the AAAI …, 2019 - aaai.org
… Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with … policy gradient algorithm
(MADDPG), for robust policy learning; (… We focus on the four mixed cooperative and competitive …

Facmac: Factored multi-agent centralised policy gradients

B Peng, T Rashid… - Advances in …, 2021 - proceedings.neurips.cc
… uses deep deterministic policy gradients to learn policies. However… In addition, FACMAC
uses a centralised policy gradientmixed settings. We assume each agent a has a deterministic …

A Vehicle Path Planning Algorithm Based on Mixed Policy Gradient Actor‐Critic Model with Random Escape Term and Filter Optimization

W Nai, Z Yang, D Lin, D Li, Y Xing - Journal of Mathematics, 2022 - Wiley Online Library
… in the existing mixed policy gradient methods, this paper proposes a new mixed policy
gradient form and proposes a novel AC model on the basis of such mixed policy gradient. …

Mixing-time regularized policy gradient

T Morimura, T Osogami, T Shirai - … of the AAAI Conference on Artificial …, 2014 - ojs.aaai.org
… 4 Mixing-time regularized policy gradient We derive a framework of policy gradient with …
The results in the previous section indicate that, in order to compute some statistics for the policy

A collaborative multiagent reinforcement learning method based on policy gradient potential

Z Zhang, YS Ong, D Wang, B Xue - IEEE transactions on …, 2019 - ieeexplore.ieee.org
gradient-based MARL algorithms for identical interest games are quite few. In this article, we
propose a policy gradient … update, as opposed to the gradient itself, to learn the optimal joint …

QSOD: Hybrid policy gradient for deep multi-agent reinforcement learning

HMRU Rehman, BW On, DD Ningombam, S Yi… - IEEE …, 2021 - ieeexplore.ieee.org
… We introduce a hybrid policy gradient for deep MARL, known as Q-value Selection using
Optimization and DRL (QSOD), to mitigate this problem. It relies on a grey wolf optimizer (GWO) …

Policy gradients incorporating the future

D Venuto, E Lau, D Precup, O Nachum - arXiv preprint arXiv:2108.02096, 2021 - arxiv.org
… The blue line separates the data collection and policy gradient training steps in our algorithm
and the … Top: we show the training model where the policy gradient loss is calculated with …