Shaping advice in deep multi-agent reinforcement learning

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

被引用次数：31 相关文章所有 7 个版本

[PDF] mlr.press

Difference advantage estimation for multi-agent policy gradients

Y Li, G Xie, Z Lu - International Conference on Machine …, 2022 - proceedings.mlr.press

Multi-agent policy gradient methods in centralized training with decentralized execution
recently witnessed many progresses. During centralized training, multi-agent credit …

被引用次数：17 相关文章所有 4 个版本

[PDF] jair.org Full View

Multi-agent advisor Q-learning

SG Subramanian, ME Taylor, K Larson… - Journal of Artificial …, 2022 - jair.org

In the last decade, there have been significant advances in multi-agent reinforcement
learning (MARL) but there are still numerous challenges, such as high sample complexity …

被引用次数：12 相关文章所有 12 个版本

[PDF] arxiv.org

Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles

U Ayvaz, CH Cheng, S Hao - 2023 International Joint …, 2023 - ieeexplore.ieee.org

While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases,
their irrational action in some unforeseen cases leads to critical safety concerns. This paper …

Convergence Analysis of Minimax Optimization and Multiagent Reinforcement Learning

Z Chen - 2023 - search.proquest.com

This dissertation investigates two popular machine learning frameworks, namely, minimax
optimization and multiagent reinforcement learning (MARL). There are a large number of …

[PDF] snu.ac.kr

[PDF][PDF] Geometric Understanding of Reward Function in Multi-Agent Visual Exploration

M Hwang, O Kwon, S Oh - rllab.snu.ac.kr

Reward shaping has proven to be a powerful tool to improve an agent's performance in
single agent reinforcement learning. Recently, this method has also been applied in multi …

[PDF] openreview.net

UTS: When Monotonic Value Factorisation Meets Non-monotonic and Stochastic Targets

Z Liu, L Wan, X Sui, X Chen, X Lan - openreview.net

Extracting decentralised policies from joint action-values is an attractive way to exploit
centralised learning. It is possible to apply monotonic value factorisation to guarantee …