Optimistic mirror descent in saddle-point problems: Going the extra (gradient) mile

P Mertikopoulos, B Lecouat, H Zenati, CS Foo… - arXiv preprint arXiv …, 2018 - arxiv.org
Owing to their connection with generative adversarial networks (GANs), saddle-point
problems have recently attracted considerable interest in machine learning and beyond. By …

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arXiv preprint arXiv …, 2021 - arxiv.org
Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arXiv preprint arXiv:2110.04184, 2021 - arxiv.org
Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

On improving model-free algorithms for decentralized multi-agent reinforcement learning

W Mao, L Yang, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press
Multi-agent reinforcement learning (MARL) algorithms often suffer from an exponential
sample complexity dependence on the number of agents, a phenomenon known as the …

On last-iterate convergence beyond zero-sum games

I Anagnostides, I Panageas, G Farina… - International …, 2022 - proceedings.mlr.press
Most existing results about last-iterate convergence of learning dynamics are limited to two-
player zero-sum games, and only apply under rigid assumptions about what dynamics the …

Distributed multi-player bandits-a game of thrones approach

I Bistritz, A Leshem - Advances in Neural Information …, 2018 - proceedings.neurips.cc
We consider a multi-armed bandit game where N players compete for K arms for T turns.
Each player has different expected rewards for the arms, and the instantaneous rewards are …

Bandit learning in concave N-person games

M Bravo, D Leslie… - Advances in Neural …, 2018 - proceedings.neurips.cc
This paper examines the long-run behavior of learning with bandit feedback in non-
cooperative concave games. The bandit framework accounts for extremely low-information …

The limits of min-max optimization algorithms: Convergence to spurious non-critical sets

YP Hsieh, P Mertikopoulos… - … Conference on Machine …, 2021 - proceedings.mlr.press
Compared to minimization, the min-max optimization in machine learning applications is
considerably more convoluted because of the existence of cycles and similar phenomena …

The confluence of networks, games, and learning a game-theoretic framework for multiagent decision making over networks

T Li, G Peng, Q Zhu, T Başar - IEEE Control Systems Magazine, 2022 - ieeexplore.ieee.org
Multiagent decision making over networks has recently attracted an exponentially growing
number of researchers from the systems and control community. The area has gained …