Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …
the development of their theoretical foundations. Despite the huge efforts directed at the …
Linear convergence of natural policy gradient methods with log-linear policies
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …
WU Mondal, V Aggarwal - International Conference on …, 2024 - proceedings.mlr.press
We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …
horizon discounted reward Markov Decision Process. Specifically, we propose the …
A novel framework for policy mirror descent with general parameterization and linear convergence
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …
their success to the use of parameterized policies. However, while theoretical guarantees …
Optimal convergence rate for exact policy mirror descent in discounted markov decision processes
E Johnson, C Pike-Burke… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide
range of novel and fundamental methods in reinforcement learning. Motivated by the …
range of novel and fundamental methods in reinforcement learning. Motivated by the …
Performance bounds for policy-based average reward reinforcement learning algorithms
Y Murthy, M Moharrami… - Advances in Neural …, 2023 - proceedings.neurips.cc
Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations
of approximate policy iteration (PI), ie, where policy improvement and policy evaluation are …
of approximate policy iteration (PI), ie, where policy improvement and policy evaluation are …
Sample-Efficient Constrained Reinforcement Learning with General Parameterization
WU Mondal, V Aggarwal - arXiv preprint arXiv:2405.10624, 2024 - arxiv.org
We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is
to maximize the expected discounted sum of rewards over an infinite horizon while ensuring …
to maximize the expected discounted sum of rewards over an infinite horizon while ensuring …
On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs
It is now well known that Natural Policy Gradient (NPG) globally converges for discounted-
reward MDPs in the tabular setting, with perfect value function estimates. However, the result …
reward MDPs in the tabular setting, with perfect value function estimates. However, the result …
Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
Designing efficient algorithms for multi-agent reinforcement learning (MARL) is
fundamentally challenging due to the fact that the size of the joint state and action spaces …
fundamentally challenging due to the fact that the size of the joint state and action spaces …
Approximate Global Convergence of Independent Learning in Multi-Agent Systems
Independent learning (IL), despite being a popular approach in practice to achieve
scalability in large-scale multi-agent systems, usually lacks global convergence guarantees …
scalability in large-scale multi-agent systems, usually lacks global convergence guarantees …