Finite-sample analysis of two-time-scale natural actor–critic algorithm

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press

Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

被引用次数：41 相关文章所有 8 个版本

[PDF] arxiv.org

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

被引用次数：41 相关文章所有 7 个版本

[PDF] mlr.press

Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …

WU Mondal, V Aggarwal - International Conference on …, 2024 - proceedings.mlr.press

We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

On the linear convergence of natural policy gradient algorithm

S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org

Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …

被引用次数：64 相关文章所有 5 个版本

[PDF] mlr.press

Softmax policy gradient methods can take exponential time to converge

G Li, Y Wei, Y Chi, Y Gu… - Conference on Learning …, 2021 - proceedings.mlr.press

The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …

被引用次数：60 相关文章所有 15 个版本

[PDF] mlr.press

A natural actor-critic framework for zero-sum Markov games

A Alacaoglu, L Viano, N He… - … Conference on Machine …, 2022 - proceedings.mlr.press

We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …

被引用次数：27 相关文章所有 7 个版本

[PDF] neurips.cc

A finite-sample analysis of payoff-based independent learning in zero-sum stochastic games

Z Chen, K Zhang, E Mazumdar… - Advances in …, 2024 - proceedings.neurips.cc

In this work, we study two-player zero-sum stochastic games and develop a variant of the
smoothed best-response learning dynamics that combines independent learning dynamics …

被引用次数：13 相关文章所有 8 个版本

[PDF] mlr.press

Finite-sample analysis of off-policy natural actor-critic algorithm

S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press

In this paper, we provide finite-sample convergence guarantees for an off-policy variant of
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …

被引用次数：37 相关文章所有 4 个版本

[PDF] mlr.press

Sample complexity of policy-based methods under off-policy sampling and linear function approximation

Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press

In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Finite-sample analysis of off-policy natural actor–critic with linear function approximation

Z Chen, S Khodadadian… - IEEE Control Systems …, 2022 - ieeexplore.ieee.org

In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy
sampling and linear function approximation, and establish a sample complexity of …

被引用次数：38 相关文章所有 4 个版本