Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press
This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

A theory of regularized markov decision processes

M Geist, B Scherrer, O Pietquin - … Conference on Machine …, 2019 - proceedings.mlr.press
Many recent successful (deep) reinforcement learning algorithms make use of
regularization, generally based on entropy or Kullback-Leibler divergence. We propose a …

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc
We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

On the properties of the softmax function with application in game theory and reinforcement learning

B Gao, L Pavel - arXiv preprint arXiv:1704.00805, 2017 - arxiv.org
In this paper, we utilize results from convex analysis and monotone operator theory to derive
additional properties of the softmax function that have not yet been covered in the existing …

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L Xiao, N He… - International …, 2018 - proceedings.mlr.press
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

Learning mean-field games

X Guo, A Hu, R Xu, J Zhang - Advances in neural …, 2019 - proceedings.neurips.cc
This paper presents a general mean-field game (GMFG) framework for simultaneous
learning and decision-making in stochastic games with a large population. It first establishes …

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org
We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc
SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …