Learning with bandit feedback in potential games

P Mertikopoulos, B Lecouat, H Zenati, CS Foo… - arXiv preprint arXiv …, 2018 - arxiv.org

Owing to their connection with generative adversarial networks (GANs), saddle-point
problems have recently attracted considerable interest in machine learning and beyond. By …

被引用次数：349 相关文章所有 10 个版本

[PDF] mlr.press

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press

We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

被引用次数：85 相关文章所有 8 个版本

[PDF] arxiv.org

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arXiv preprint arXiv …, 2021 - arxiv.org

Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

被引用次数：141 相关文章所有 7 个版本

[PDF] arxiv.org

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arXiv preprint arXiv:2110.04184, 2021 - arxiv.org

Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

被引用次数：113 相关文章所有 3 个版本

[PDF] mlr.press

On improving model-free algorithms for decentralized multi-agent reinforcement learning

W Mao, L Yang, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential
sample complexity dependence on the number of agents, a phenomenon known as the …

被引用次数：69 相关文章所有 5 个版本

[PDF] mlr.press

On last-iterate convergence beyond zero-sum games

I Anagnostides, I Panageas, G Farina… - International …, 2022 - proceedings.mlr.press

Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …

被引用次数：43 相关文章所有 8 个版本

[PDF] neurips.cc

Distributed multi-player bandits-a game of thrones approach

I Bistritz, A Leshem - Advances in Neural Information …, 2018 - proceedings.neurips.cc

We consider a multi-armed bandit game where N players compete for K arms for T turns.
Each player has different expected rewards for the arms, and the instantaneous rewards are …

被引用次数：163 相关文章所有 6 个版本

[PDF] neurips.cc

Bandit learning in concave N-person games

M Bravo, D Leslie… - Advances in Neural …, 2018 - proceedings.neurips.cc

This paper examines the long-run behavior of learning with bandit feedback in non-
cooperative concave games. The bandit framework accounts for extremely low-information …

被引用次数：145 相关文章所有 14 个版本

[PDF] mlr.press

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets

YP Hsieh, P Mertikopoulos… - … Conference on Machine …, 2021 - proceedings.mlr.press

Compared to minimization, the min-max optimization in machine learning applications is
considerably more convoluted because of the existence of cycles and similar phenomena …

被引用次数：109 相关文章所有 13 个版本

[PDF] nsf.gov

The confluence of networks, games, and learning a game-theoretic framework for multiagent decision making over networks

T Li, G Peng, Q Zhu, T Başar - IEEE Control Systems Magazine, 2022 - ieeexplore.ieee.org

Multiagent decision making over networks has recently attracted an exponentially growing
number of researchers from the systems and control community. The area has gained …

被引用次数：56 相关文章所有 6 个版本