Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes

G Lan - Mathematical programming, 2023 - Springer
We present new policy mirror descent (PMD) methods for solving reinforcement learning
(RL) problems with either strongly convex or general convex regularizers. By exploring the …

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arXiv preprint arXiv …, 2022 - arxiv.org
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

On the linear convergence of natural policy gradient algorithm

S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …

A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants

Z Chen, ST Maguluri, S Shakkottai… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper develops an unified framework to study finite-sample convergence guarantees of
a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do …

A lyapunov theory for finite-sample guarantees of markovian stochastic approximation

Z Chen, ST Maguluri, S Shakkottai… - Operations …, 2024 - pubsonline.informs.org
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian
stochastic approximation (SA) algorithm under a contraction operator with respect to an …

A natural actor-critic framework for zero-sum Markov games

A Alacaoglu, L Viano, N He… - … Conference on Machine …, 2022 - proceedings.mlr.press
We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …

Finite-sample analysis of two-time-scale natural actor–critic algorithm

S Khodadadian, TT Doan, J Romberg… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Actor–critic style two-time-scale algorithms are one of the most popular methods in
reinforcement learning, and have seen great empirical success. However, their performance …

Sample complexity of policy-based methods under off-policy sampling and linear function approximation

Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press
In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …