Fast global convergence of natural policy gradient methods with entropy regularization
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …
algorithms in contemporary reinforcement learning. This class of methods is often applied in …
Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …
Monte-Carlo tree search as regularized policy optimization
Abstract The combination of Monte-Carlo tree search (MCTS) with deep reinforcement
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …
learning has led to groundbreaking results in artificial intelligence. However, AlphaZero, the …
Fast rates for maximum entropy exploration
D Tiapkin, D Belomestny… - International …, 2023 - proceedings.mlr.press
We address the challenge of exploration in reinforcement learning (RL) when the agent
operates in an unknown environment with sparse or no rewards. In this work, we study the …
operates in an unknown environment with sparse or no rewards. In this work, we study the …
Efficient policy iteration for robust markov decision processes via regularization
Robust Markov decision processes (MDPs) provide a general framework to model decision
problems where the system dynamics are changing or only partially known. Efficient …
problems where the system dynamics are changing or only partially known. Efficient …
Regularized rl
Incorporating expert demonstrations has empirically helped to improve the sample efficiency
of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra …
of reinforcement learning (RL). This paper quantifies theoretically to what extent this extra …
Planning in markov decision processes with gap-dependent sample complexity
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for
planning in a Markov Decision Process in which transitions have a finite support. We prove …
planning in a Markov Decision Process in which transitions have a finite support. We prove …
Roping in Uncertainty: Robustness and Regularization in Markov Games
We study robust Markov games (RMG) with $ s $-rectangular uncertainty. We show a
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …
general equivalence between computing a robust Nash equilibrium (RNE) of a $ s …
Reinforcement learning for cost-aware Markov decision processes
Ratio maximization has applications in areas as diverse as finance, reward shaping for
reinforcement learning (RL), and the development of safe artificial intelligence, yet there has …
reinforcement learning (RL), and the development of safe artificial intelligence, yet there has …
Monte-carlo graph search: the value of merging similar states
E Leurent, OA Maillard - Asian Conference on Machine …, 2020 - proceedings.mlr.press
We consider the problem of planning in a Markov Decision Process (MDP) with a generative
model and limited computational budget. Despite the underlying MDP transitions having a …
model and limited computational budget. Despite the underlying MDP transitions having a …