Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

Difference advantage estimation for multi-agent policy gradients

Y Li, G Xie, Z Lu - International Conference on Machine …, 2022 - proceedings.mlr.press
Multi-agent policy gradient methods in centralized training with decentralized execution
recently witnessed many progresses. During centralized training, multi-agent credit …

Multi-agent advisor Q-learning

SG Subramanian, ME Taylor, K Larson… - Journal of Artificial …, 2022 - jair.org
In the last decade, there have been significant advances in multi-agent reinforcement
learning (MARL) but there are still numerous challenges, such as high sample complexity …

Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles

U Ayvaz, CH Cheng, S Hao - 2023 International Joint …, 2023 - ieeexplore.ieee.org
While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases,
their irrational action in some unforeseen cases leads to critical safety concerns. This paper …

Convergence Analysis of Minimax Optimization and Multiagent Reinforcement Learning

Z Chen - 2023 - search.proquest.com
This dissertation investigates two popular machine learning frameworks, namely, minimax
optimization and multiagent reinforcement learning (MARL). There are a large number of …

[PDF][PDF] Geometric Understanding of Reward Function in Multi-Agent Visual Exploration

M Hwang, O Kwon, S Oh - rllab.snu.ac.kr
Reward shaping has proven to be a powerful tool to improve an agent's performance in
single agent reinforcement learning. Recently, this method has also been applied in multi …

UTS: When Monotonic Value Factorisation Meets Non-monotonic and Stochastic Targets

Z Liu, L Wan, X Sui, X Chen, X Lan - openreview.net
Extracting decentralised policies from joint action-values is an attractive way to exploit
centralised learning. It is possible to apply monotonic value factorisation to guarantee …