Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

Monte-Carlo tree search as regularized policy optimization

JB Grill, F Altché, Y Tang, T Hubert… - International …, 2020 - proceedings.mlr.press
Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

Efficient policy iteration for robust markov decision processes via regularization

N Kumar, K Levy, K Wang, S Mannor - arXiv preprint arXiv:2205.14327, 2022 - arxiv.org
Robust Markov decision processes (MDPs) provide a general framework to model decision
problems where the system dynamics are changing or only partially known. Efficient …

Regularized rl

D Tiapkin, D Belomestny, D Calandriello… - arXiv preprint arXiv …, 2023 - arxiv.org
Incorporating expert demonstrations has empirically helped to improve the sample efficiency
of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra …

Planning in markov decision processes with gap-dependent sample complexity

A Jonsson, E Kaufmann, P Ménard… - Advances in …, 2020 - proceedings.neurips.cc
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for
planning in a Markov Decision Process in which transitions have a finite support. We prove …

Roping in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q Xie - arXiv preprint arXiv:2406.08847, 2024 - arxiv.org
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

Reinforcement learning for cost-aware Markov decision processes

W Suttle, K Zhang, Z Yang, J Liu… - … on Machine Learning, 2021 - proceedings.mlr.press
Ratio maximization has applications in areas as diverse as finance, reward shaping for
reinforcement learning (RL), and the development of safe artificial intelligence, yet there has …

Monte-carlo graph search: the value of merging similar states

E Leurent, OA Maillard - Asian Conference on Machine …, 2020 - proceedings.mlr.press
We consider the problem of planning in a Markov Decision Process (MDP) with a generative
model and limited computational budget. Despite the underlying MDP transitions having a …