- 学术资源搜索

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：194 相关文章所有 13 个版本

[PDF] mlr.press

Policy gradient method for robust reinforcement learning

Y Wang, S Zou - International conference on machine …, 2022 - proceedings.mlr.press

This paper develops the first policy gradient method with global optimality guarantee and
complexity analysis for robust reinforcement learning under model mismatch. Robust …

被引用次数：75 相关文章所有 7 个版本

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

被引用次数：108 相关文章所有 10 个版本

[PDF] mlr.press

A theory of regularized markov decision processes

M Geist, B Scherrer, O Pietquin - … Conference on Machine …, 2019 - proceedings.mlr.press

Many recent successful (deep) reinforcement learning algorithms make use of
regularization, generally based on entropy or Kullback-Leibler divergence. We propose a …

被引用次数：345 相关文章所有 9 个版本

[PDF] neurips.cc

Bridging the gap between value and policy based reinforcement learning

O Nachum, M Norouzi, K Xu… - Advances in neural …, 2017 - proceedings.neurips.cc

We establish a new connection between value and policy based reinforcement learning
(RL) based on a relationship between softmax temporal value consistency and policy …

被引用次数：549 相关文章所有 14 个版本

[PDF] arxiv.org

On the properties of the softmax function with application in game theory and reinforcement learning

B Gao, L Pavel - arXiv preprint arXiv:1704.00805, 2017 - arxiv.org

In this paper, we utilize results from convex analysis and monotone operator theory to derive
additional properties of the softmax function that have not yet been covered in the existing …

被引用次数：437 相关文章所有 4 个版本

[PDF] mlr.press

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L Xiao, N He… - International …, 2018 - proceedings.mlr.press

When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

被引用次数：320 相关文章所有 8 个版本

[PDF] neurips.cc

Learning mean-field games

X Guo, A Hu, R Xu, J Zhang - Advances in neural …, 2019 - proceedings.neurips.cc

This paper presents a general mean-field game (GMFG) framework for simultaneous
learning and decision-making in stochastic games with a large population. It first establishes …

被引用次数：236 相关文章所有 9 个版本

[PDF] arxiv.org

A unified view of entropy-regularized markov decision processes

G Neu, A Jonsson, V Gómez - arXiv preprint arXiv:1705.07798, 2017 - arxiv.org

We propose a general framework for entropy-regularized average-reward reinforcement
learning in Markov decision processes (MDPs). Our approach is based on extending the …

被引用次数：290 相关文章所有 9 个版本

[PDF] neurips.cc

Finite-sample analysis for sarsa with linear function approximation

S Zou, T Xu, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement
learning. We investigate the SARSA algorithm with linear function approximation under the …

被引用次数：207 相关文章所有 9 个版本