Finite-sample analysis of off-policy natural actor-critic algorithm

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes

G Lan - Mathematical programming, 2023 - Springer

We present new policy mirror descent (PMD) methods for solving reinforcement learning
(RL) problems with either strongly convex or general convex regularizers. By exploring the …

被引用次数：156 相关文章所有 6 个版本

[PDF] arxiv.org

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

被引用次数：41 相关文章所有 7 个版本

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

被引用次数：20 相关文章所有 8 个版本

[PDF] arxiv.org

On the linear convergence of natural policy gradient algorithm

S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org

Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …

被引用次数：64 相关文章所有 5 个版本

A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants

Z Chen, ST Maguluri, S Shakkottai… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper develops an unified framework to study finite-sample convergence guarantees of
a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do …

被引用次数：64 相关文章所有 2 个版本

[PDF] nsf.gov

A lyapunov theory for finite-sample guarantees of markovian stochastic approximation

Z Chen, ST Maguluri, S Shakkottai… - Operations …, 2024 - pubsonline.informs.org

This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian
stochastic approximation (SA) algorithm under a contraction operator with respect to an …

被引用次数：12 相关文章

[PDF] mlr.press

A natural actor-critic framework for zero-sum Markov games

A Alacaoglu, L Viano, N He… - … Conference on Machine …, 2022 - proceedings.mlr.press

We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …

被引用次数：27 相关文章所有 7 个版本

[PDF] arxiv.org

Finite-sample analysis of two-time-scale natural actor–critic algorithm

S Khodadadian, TT Doan, J Romberg… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Actor–critic style two-time-scale algorithms are one of the most popular methods in
reinforcement learning, and have seen great empirical success. However, their performance …

被引用次数：48 相关文章所有 5 个版本

[PDF] mlr.press

Sample complexity of policy-based methods under off-policy sampling and linear function approximation

Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press

In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …

被引用次数：21 相关文章所有 2 个版本