Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arXiv preprint arXiv …, 2022 - arxiv.org
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …

WU Mondal, V Aggarwal - International Conference on …, 2024 - proceedings.mlr.press
We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …

On the linear convergence of natural policy gradient algorithm

S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …

Softmax policy gradient methods can take exponential time to converge

G Li, Y Wei, Y Chi, Y Gu… - Conference on Learning …, 2021 - proceedings.mlr.press
The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …

A natural actor-critic framework for zero-sum Markov games

A Alacaoglu, L Viano, N He… - … Conference on Machine …, 2022 - proceedings.mlr.press
We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …

A finite-sample analysis of payoff-based independent learning in zero-sum stochastic games

Z Chen, K Zhang, E Mazumdar… - Advances in …, 2024 - proceedings.neurips.cc
In this work, we study two-player zero-sum stochastic games and develop a variant of the
smoothed best-response learning dynamics that combines independent learning dynamics …

Finite-sample analysis of off-policy natural actor-critic algorithm

S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press
In this paper, we provide finite-sample convergence guarantees for an off-policy variant of
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …

Sample complexity of policy-based methods under off-policy sampling and linear function approximation

Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press
In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …

Finite-sample analysis of off-policy natural actor–critic with linear function approximation

Z Chen, S Khodadadian… - IEEE Control Systems …, 2022 - ieeexplore.ieee.org
In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy
sampling and linear function approximation, and establish a sample complexity of …