Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …
the development of their theoretical foundations. Despite the huge efforts directed at the …
Linear convergence of natural policy gradient methods with log-linear policies
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …
Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …
WU Mondal, V Aggarwal - International Conference on …, 2024 - proceedings.mlr.press
We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …
horizon discounted reward Markov Decision Process. Specifically, we propose the …
On the linear convergence of natural policy gradient algorithm
S Khodadadian, PR Jhunjhunwala… - 2021 60th IEEE …, 2021 - ieeexplore.ieee.org
Markov Decision Processes are classically solved using Value Iteration and Policy Iteration
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …
algorithms. Recent interest in Reinforcement Learning has motivated the study of methods …
Softmax policy gradient methods can take exponential time to converge
The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …
policy parameterization, is arguably one of the de facto implementations of policy …
A natural actor-critic framework for zero-sum Markov games
We introduce algorithms based on natural actor-critic and analyze their sample complexity
for solving two player zero-sum Markov games in the tabular case. Our results improve the …
for solving two player zero-sum Markov games in the tabular case. Our results improve the …
A finite-sample analysis of payoff-based independent learning in zero-sum stochastic games
In this work, we study two-player zero-sum stochastic games and develop a variant of the
smoothed best-response learning dynamics that combines independent learning dynamics …
smoothed best-response learning dynamics that combines independent learning dynamics …
Finite-sample analysis of off-policy natural actor-critic algorithm
S Khodadadian, Z Chen… - … Conference on Machine …, 2021 - proceedings.mlr.press
In this paper, we provide finite-sample convergence guarantees for an off-policy variant of
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …
the natural actor-critic (NAC) algorithm based on Importance Sampling. In particular, we …
Sample complexity of policy-based methods under off-policy sampling and linear function approximation
Z Chen, ST Maguluri - International Conference on Artificial …, 2022 - proceedings.mlr.press
In this work, we study policy-based methods for solving the reinforcement learning problem,
where off-policy sampling and linear function approximation are employed for policy …
where off-policy sampling and linear function approximation are employed for policy …
Finite-sample analysis of off-policy natural actor–critic with linear function approximation
Z Chen, S Khodadadian… - IEEE Control Systems …, 2022 - ieeexplore.ieee.org
In this letter, we develop a novel variant of natural actor-critic algorithm using off-policy
sampling and linear function approximation, and establish a sample complexity of …
sampling and linear function approximation, and establish a sample complexity of …