A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc
Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org
Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

Reinforcement learning for quantitative trading

S Sun, R Wang, B An - ACM Transactions on Intelligent Systems and …, 2023 - dl.acm.org
Quantitative trading (QT), which refers to the usage of mathematical models and data-driven
techniques in analyzing the financial market, has been a popular topic in both academia and …

An alternative to variance: Gini deviation for risk-averse policy gradient

Y Luo, G Liu, P Poupart, Y Pan - Advances in Neural …, 2023 - proceedings.neurips.cc
Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …

Challenging common assumptions in convex reinforcement learning

M Mutti, R De Santi… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract The classic Reinforcement Learning (RL) formulation concerns the maximization of
a scalar reward function. More recently, convex RL has been introduced to extend the RL …

A Review of Safe Reinforcement Learning: Methods, Theories and Applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Mean-variance policy iteration for risk-averse reinforcement learning

S Zhang, B Liu, S Whiteson - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org
We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a
discounted infinite horizon MDP optimizing the variance of a per-step reward random …

Off-policy evaluation with deficient support using side information

N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc
Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …

Cva hedging with reinforcement learning

R Daluiso, M Pinciroli, M Trapletti, E Vittori - Proceedings of the Fourth …, 2023 - dl.acm.org
This work considers the problem of a trader who must manage the Credit Valuation
Adjustment (CVA) of a derivative, defined as the risk-neutral expectation of losses incurred if …