Near-optimal reinforcement learning with self-play

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc

Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

被引用次数：161 相关文章所有 9 个版本

[PDF] neurips.cc

Independent policy gradient methods for competitive reinforcement learning

C Daskalakis, DJ Foster… - Advances in neural …, 2020 - proceedings.neurips.cc

We obtain global, non-asymptotic convergence guarantees for independent learning
algorithms in competitive reinforcement learning settings with two agents (ie, zero-sum …

被引用次数：180 相关文章所有 7 个版本

[PDF] mlr.press

A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press

Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

被引用次数：146 相关文章所有 6 个版本

[PDF] arxiv.org

V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL

C Jin, Q Liu, Y Wang, T Yu - arXiv preprint arXiv:2110.14555, 2021 - arxiv.org

A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents,
where the size of the joint action space scales exponentially with the number of agents. This …

被引用次数：101 相关文章所有 3 个版本

[PDF] arxiv.org

Independent learning in stochastic games

A Ozdaglar, MO Sayin, K Zhang - International Congress of …, 2021 - ems.press

Reinforcement learning (RL) has recently achieved tremendous successes in many artificial
intelligence applications. Many of the forefront applications of RL involve multiple agents …

被引用次数：28 相关文章所有 5 个版本

[PDF] arxiv.org

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arXiv preprint arXiv:2110.04184, 2021 - arxiv.org

Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

被引用次数：102 相关文章所有 3 个版本

[PDF] mlr.press

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press

We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

被引用次数：70 相关文章所有 6 个版本

[PDF] neurips.cc

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

被引用次数：143 相关文章所有 12 个版本

[PDF] neurips.cc

Decentralized Q-learning in zero-sum Markov games

M Sayin, K Zhang, D Leslie, T Basar… - Advances in Neural …, 2021 - proceedings.neurips.cc

We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum
Markov games. We focus on the practical but challenging setting of decentralized MARL …

被引用次数：98 相关文章所有 8 个版本

[PDF] mlr.press

Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive markov games

CY Wei, CW Lee, M Zhang… - Conference on learning …, 2021 - proceedings.mlr.press

We study infinite-horizon discounted two-player zero-sum Markov games, and develop a
decentralized algorithm that provably converges to the set of Nash equilibria under self-play …

被引用次数：102 相关文章所有 4 个版本