Federated reinforcement learning: Linear speedup under markovian sampling
Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …
observations from the environment is usually split across multiple agents. However …
Policy mirror descent for reinforcement learning: Linear convergence, new sampling complexity, and generalized problem classes
G Lan - Mathematical programming, 2023 - Springer
We present new policy mirror descent (PMD) methods for solving reinforcement learning
(RL) problems with either strongly convex or general convex regularizers. By exploring the …
(RL) problems with either strongly convex or general convex regularizers. By exploring the …
Linear convergence of natural policy gradient methods with log-linear policies
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
A novel framework for policy mirror descent with general parameterization and linear convergence
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …
their success to the use of parameterized policies. However, while theoretical guarantees …
On the linear convergence of natural policy gradient algorithm
S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …
A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants
Z Chen, ST Maguluri, S Shakkottai… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper develops an unified framework to study finite-sample convergence guarantees of
a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do …
a large class of value-based asynchronous reinforcement learning (RL) algorithms. We do …
A lyapunov theory for finite-sample guarantees of markovian stochastic approximation
Z Chen, ST Maguluri, S Shakkottai… - Operations …, 2024 - pubsonline.informs.org
This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian
stochastic approximation (SA) algorithm under a contraction operator with respect to an …
stochastic approximation (SA) algorithm under a contraction operator with respect to an …
A natural actor-critic framework for zero-sum Markov games
We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …
for solving two player zero-sum Markov games in the tabular case. Our results improve the …
Finite-sample analysis of two-time-scale natural actor–critic algorithm
Actor–critic style two-time-scale algorithms are one of the most popular methods in
reinforcement learning, and have seen great empirical success. However, their performance …
reinforcement learning, and have seen great empirical success. However, their performance …
Sample complexity of policy-based methods under off-policy sampling and linear function approximation
Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press
In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …
where off-policy sampling and linear function approximation are employed for policy …