Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

Independent policy gradient methods for competitive reinforcement learning

C Daskalakis, DJ Foster… - Advances in neural …, 2020 - proceedings.neurips.cc
We obtain global, non-asymptotic convergence guarantees for independent learning
algorithms in competitive reinforcement learning settings with two agents (ie, zero-sum …

A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press
Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

V-Learning--A Simple, Efficient, Decentralized Algorithm for Multiagent RL

C Jin, Q Liu, Y Wang, T Yu - arXiv preprint arXiv:2110.14555, 2021 - arxiv.org
A major challenge of multiagent reinforcement learning (MARL) is the curse of multiagents,
where the size of the joint action space scales exponentially with the number of agents. This …

Independent learning in stochastic games

A Ozdaglar, MO Sayin, K Zhang - International Congress of …, 2021 - ems.press
Reinforcement learning (RL) has recently achieved tremendous successes in many artificial
intelligence applications. Many of the forefront applications of RL involve multiple agents …

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arXiv preprint arXiv:2110.04184, 2021 - arxiv.org
Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

The complexity of markov equilibrium in stochastic games

C Daskalakis, N Golowich… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We show that computing approximate stationary Markov coarse correlated equilibria (CCE)
in general-sum stochastic games is PPAD-hard, even when there are two players, the game …

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

Decentralized Q-learning in zero-sum Markov games

M Sayin, K Zhang, D Leslie, T Basar… - Advances in Neural …, 2021 - proceedings.neurips.cc
We study multi-agent reinforcement learning (MARL) in infinite-horizon discounted zero-sum
Markov games. We focus on the practical but challenging setting of decentralized MARL …

Last-iterate convergence of decentralized optimistic gradient descent/ascent in infinite-horizon competitive markov games

CY Wei, CW Lee, M Zhang… - Conference on learning …, 2021 - proceedings.mlr.press
We study infinite-horizon discounted two-player zero-sum Markov games, and develop a
decentralized algorithm that provably converges to the set of Nash equilibria under self-play …