Global optimality guarantees for policy gradient methods

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：149 相关文章所有 13 个版本

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

被引用次数：134 相关文章所有 3 个版本

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

被引用次数：444 相关文章所有 13 个版本

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

被引用次数：393 相关文章所有 3 个版本

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

被引用次数：203 相关文章所有 8 个版本

[PDF] mlr.press

On the global convergence rates of softmax policy gradient methods

J Mei, C Xiao, C Szepesvari… - … on machine learning, 2020 - proceedings.mlr.press

We make three contributions toward better understanding policy gradient methods in the
tabular setting. First, we show that with the true gradient, policy gradient with a softmax …

被引用次数：298 相关文章所有 14 个版本

[PDF] informs.org

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

被引用次数：215 相关文章所有 15 个版本

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

被引用次数：306 相关文章所有 9 个版本

[PDF] neurips.cc

Independent policy gradient methods for competitive reinforcement learning

C Daskalakis, DJ Foster… - Advances in neural …, 2020 - proceedings.neurips.cc

We obtain global, non-asymptotic convergence guarantees for independent learning
algorithms in competitive reinforcement learning settings with two agents (ie, zero-sum …

被引用次数：186 相关文章所有 7 个版本

[PDF] jmlr.org

On the convergence rates of policy gradient methods

L Xiao - Journal of Machine Learning Research, 2022 - jmlr.org

We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …

被引用次数：101 相关文章所有 4 个版本