Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …
learn the optimal joint control policy. However, existing decentralized AC algorithms either …
Difference advantage estimation for multi-agent policy gradients
Multi-agent policy gradient methods in centralized training with decentralized execution
recently witnessed many progresses. During centralized training, multi-agent credit …
recently witnessed many progresses. During centralized training, multi-agent credit …
Multi-agent advisor Q-learning
In the last decade, there have been significant advances in multi-agent reinforcement
learning (MARL) but there are still numerous challenges, such as high sample complexity …
learning (MARL) but there are still numerous challenges, such as high sample complexity …
Potential-based Credit Assignment for Cooperative RL-based Testing of Autonomous Vehicles
While autonomous vehicles (AVs) may perform remarkably well in generic real-life cases,
their irrational action in some unforeseen cases leads to critical safety concerns. This paper …
their irrational action in some unforeseen cases leads to critical safety concerns. This paper …
Convergence Analysis of Minimax Optimization and Multiagent Reinforcement Learning
Z Chen - 2023 - search.proquest.com
This dissertation investigates two popular machine learning frameworks, namely, minimax
optimization and multiagent reinforcement learning (MARL). There are a large number of …
optimization and multiagent reinforcement learning (MARL). There are a large number of …
[PDF][PDF] Geometric Understanding of Reward Function in Multi-Agent Visual Exploration
Reward shaping has proven to be a powerful tool to improve an agent's performance in
single agent reinforcement learning. Recently, this method has also been applied in multi …
single agent reinforcement learning. Recently, this method has also been applied in multi …
UTS: When Monotonic Value Factorisation Meets Non-monotonic and Stochastic Targets
Z Liu, L Wan, X Sui, X Chen, X Lan - openreview.net
Extracting decentralised policies from joint action-values is an attractive way to exploit
centralised learning. It is possible to apply monotonic value factorisation to guarantee …
centralised learning. It is possible to apply monotonic value factorisation to guarantee …