Planning in entropy-regularized Markov decision processes and games

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

被引用次数：215 相关文章所有 15 个版本

[PDF] neurips.cc

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

被引用次数：145 相关文章所有 12 个版本

[PDF] mlr.press

Monte-Carlo tree search as regularized policy optimization

JB Grill, F Altché, Y Tang, T Hubert… - International …, 2020 - proceedings.mlr.press

Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …

被引用次数：75 相关文章所有 12 个版本

[PDF] mlr.press

Fast rates for maximum entropy exploration

D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press

We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …

被引用次数：9 相关文章所有 9 个版本

[PDF] arxiv.org

Efficient policy iteration for robust markov decision processes via regularization

N Kumar, K Levy, K Wang, S Mannor - arXiv preprint arXiv:2205.14327, 2022 - arxiv.org

Robust Markov decision processes (MDPs) provide a general framework to model decision
problems where the system dynamics are changing or only partially known. Efficient …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Regularized rl

D Tiapkin, D Belomestny, D Calandriello… - arXiv preprint arXiv …, 2023 - arxiv.org

Incorporating expert demonstrations has empirically helped to improve the sample efficiency
of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra …

被引用次数：5 相关文章所有 4 个版本

[PDF] neurips.cc

Planning in markov decision processes with gap-dependent sample complexity

A Jonsson, E Kaufmann, P Ménard… - Advances in …, 2020 - proceedings.neurips.cc

We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for
planning in a Markov Decision Process in which transitions have a finite support. We prove …

被引用次数：33 相关文章所有 12 个版本

[PDF] arxiv.org

Roping in Uncertainty: Robustness and Regularization in Markov Games

J McMahan, G Artiglio, Q Xie - arXiv preprint arXiv:2406.08847, 2024 - arxiv.org

We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …

被引用次数：1 相关文章所有 3 个版本

[PDF] mlr.press

Reinforcement learning for cost-aware Markov decision processes

W Suttle, K Zhang, Z Yang, J Liu… - … on Machine Learning, 2021 - proceedings.mlr.press

Ratio maximization has applications in areas as diverse as finance, reward shaping for
reinforcement learning (RL), and the development of safe artificial intelligence, yet there has …

被引用次数：9 相关文章所有 3 个版本

[PDF] mlr.press

Monte-carlo graph search: the value of merging similar states

E Leurent, OA Maillard - Asian Conference on Machine …, 2020 - proceedings.mlr.press

We consider the problem of planning in a Markov Decision Process (MDP) with a generative
model and limited computational budget. Despite the underlying MDP transitions having a …

被引用次数：15 相关文章所有 6 个版本