Mixed policy gradient- 学术资源搜索

Mixed policy gradient

Y Guan, J Duan, SE Li, J Li, J Chen… - arXiv preprint arXiv …, 2021 - arxiv.org

… This paper proposes mixed policy gradient (MPG) algorithm, which fuses the empirical
data and the transition model in policy gradient (PG) to accelerate convergence without …

被引用次数：22 相关文章所有 3 个版本

[PDF] neurips.cc

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

SS Gu, T Lillicrap, RE Turner… - Advances in neural …, 2017 - proceedings.neurips.cc

… family of policy gradient methods that interpolate between on-policy and off-policy learning.
… our interpolated policy gradient method is the use of control variates to mix likelihood ratio …

被引用次数：194 相关文章所有 16 个版本

[PDF] mlr.press

A policy gradient algorithm for learning to learn in multiagent reinforcement learning

DK Kim, M Liu, MD Riemer, C Sun… - International …, 2021 - proceedings.mlr.press

… gradient updates to consider both an agent’s own non-stationary policy dynamics and the
non-stationary policy … full spectrum of mixed incentive, competitive, and cooperative domains. …

被引用次数：62 相关文章所有 9 个版本

[PDF] aaai.org

Robust multi-agent reinforcement learning via minimax deep deterministic policy gradient

S Li, Y Wu, X Cui, H Dong, F Fang, S Russell - Proceedings of the AAAI …, 2019 - aaai.org

… Multi-agent Deep Deterministic Policy Gradient (M3DDPG) with … policy gradient algorithm
(MADDPG), for robust policy learning; (… We focus on the four mixed cooperative and competitive …

被引用次数：346 相关文章所有 16 个版本

相关搜索

[PDF] neurips.cc

Facmac: Factored multi-agent centralised policy gradients

B Peng, T Rashid… - Advances in …, 2021 - proceedings.neurips.cc

… uses deep deterministic policy gradients to learn policies. However… In addition, FACMAC
uses a centralised policy gradient … mixed settings. We assume each agent a has a deterministic …

被引用次数：209 相关文章所有 9 个版本

[PDF] wiley.com Full View

A Vehicle Path Planning Algorithm Based on Mixed Policy Gradient Actor‐Critic Model with Random Escape Term and Filter Optimization

W Nai, Z Yang, D Lin, D Li, Y Xing - Journal of Mathematics, 2022 - Wiley Online Library

… in the existing mixed policy gradient methods, this paper proposes a new mixed policy
gradient form and proposes a novel AC model on the basis of such mixed policy gradient. …

被引用次数：4 相关文章所有 6 个版本

[PDF] aaai.org

Mixing-time regularized policy gradient

T Morimura, T Osogami, T Shirai - … of the AAAI Conference on Artificial …, 2014 - ojs.aaai.org

… 4 Mixing-time regularized policy gradient We derive a framework of policy gradient with …
The results in the previous section indicate that, in order to compute some statistics for the policy …

被引用次数：7 相关文章所有 10 个版本

A collaborative multiagent reinforcement learning method based on policy gradient potential

Z Zhang, YS Ong, D Wang, B Xue - IEEE transactions on …, 2019 - ieeexplore.ieee.org

… gradient-based MARL algorithms for identical interest games are quite few. In this article, we
propose a policy gradient … update, as opposed to the gradient itself, to learn the optimal joint …

被引用次数：36 相关文章所有 3 个版本

[PDF] ieee.org

QSOD: Hybrid policy gradient for deep multi-agent reinforcement learning

HMRU Rehman, BW On, DD Ningombam, S Yi… - IEEE …, 2021 - ieeexplore.ieee.org

… We introduce a hybrid policy gradient for deep MARL, known as Q-value Selection using
Optimization and DRL (QSOD), to mitigate this problem. It relies on a grey wolf optimizer (GWO) …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Policy gradients incorporating the future

D Venuto, E Lau, D Precup, O Nachum - arXiv preprint arXiv:2108.02096, 2021 - arxiv.org

… The blue line separates the data collection and policy gradient training steps in our algorithm
and the … Top: we show the training model where the policy gradient loss is calculated with …

被引用次数：13 相关文章所有 4 个版本