Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

[图书][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org
Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press
Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

On the global convergence rates of softmax policy gradient methods

J Mei, C Xiao, C Szepesvari… - … on machine learning, 2020 - proceedings.mlr.press
We make three contributions toward better understanding policy gradient methods in the
tabular setting. First, we show that with the true gradient, policy gradient with a softmax …

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Independent policy gradient methods for competitive reinforcement learning

C Daskalakis, DJ Foster… - Advances in neural …, 2020 - proceedings.neurips.cc
We obtain global, non-asymptotic convergence guarantees for independent learning
algorithms in competitive reinforcement learning settings with two agents (ie, zero-sum …

On the convergence rates of policy gradient methods

L Xiao - Journal of Machine Learning Research, 2022 - jmlr.org
We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …