Sample complexity of policy-based methods under off-policy sampling and linear function approxima...

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

被引用次数：41 相关文章所有 8 个版本

[PDF] arxiv.org

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

被引用次数：41 相关文章所有 7 个版本

[PDF] mlr.press

Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …

WU Mondal, V Aggarwal - International Conference on …, 2024 - proceedings.mlr.press

We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …

被引用次数：16 相关文章所有 5 个版本

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

被引用次数：20 相关文章所有 8 个版本

[PDF] neurips.cc

Optimal convergence rate for exact policy mirror descent in discounted markov decision processes

E Johnson, C Pike-Burke… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide
range of novel and fundamental methods in reinforcement learning. Motivated by the …

被引用次数：12 相关文章所有 8 个版本

[PDF] neurips.cc

Performance bounds for policy-based average reward reinforcement learning algorithms

Y Murthy, M Moharrami… - Advances in Neural …, 2023 - proceedings.neurips.cc

Many policy-based reinforcement learning (RL) algorithms can be viewed as instantiations
of approximate policy iteration (PI), ie, where policy improvement and policy evaluation are …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

Sample-Efficient Constrained Reinforcement Learning with General Parameterization

WU Mondal, V Aggarwal - arXiv preprint arXiv:2405.10624, 2024 - arxiv.org

We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is
to maximize the expected discounted sum of rewards over an infinite horizon while ensuring …

被引用次数：3 相关文章所有 3 个版本

On the Convergence of Natural Policy Gradient and Mirror Descent-Like Policy Methods for Average-Reward MDPs

Y Murthy, R Srikant - 2023 62nd IEEE Conference on Decision …, 2023 - ieeexplore.ieee.org

It is now well known that Natural Policy Gradient (NPG) globally converges for discounted-
reward MDPs in the tabular setting, with perfect value function estimates. However, the result …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning

E Anand, I Karmarkar, G Qu - arXiv preprint arXiv:2412.00661, 2024 - arxiv.org

Designing efficient algorithms for multi-agent reinforcement learning (MARL) is
fundamentally challenging due to the fact that the size of the joint state and action spaces …

Approximate Global Convergence of Independent Learning in Multi-Agent Systems

R Jin, Z Chen, Y Lin, J Song, A Wierman - arXiv preprint arXiv:2405.19811, 2024 - arxiv.org

Independent learning (IL), despite being a popular approach in practice to achieve
scalability in large-scale multi-agent systems, usually lacks global convergence guarantees …

被引用次数：1 相关文章所有 2 个版本