Risk-averse trust region optimization for reward-volatility reduction

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：247 相关文章所有 2 个版本

[PDF] neurips.cc

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

被引用次数：37 相关文章所有 9 个版本

[PDF] jmlr.org

Convex reinforcement learning in finite trials

M Mutti, R De Santi, P De Bartolomeis… - Journal of Machine …, 2023 - jmlr.org

Convex Reinforcement Learning (RL) is a recently introduced framework that generalizes
the standard RL objective to any convex (or concave) function of the state distribution …

被引用次数：10 相关文章所有 5 个版本

[PDF] acm.org

Reinforcement learning for quantitative trading

S Sun, R Wang, B An - ACM Transactions on Intelligent Systems and …, 2023 - dl.acm.org

Quantitative trading (QT), which refers to the usage of mathematical models and data-driven
techniques in analyzing the financial market, has been a popular topic in both academia and …

被引用次数：55 相关文章所有 6 个版本

[PDF] neurips.cc

An alternative to variance: Gini deviation for risk-averse policy gradient

Y Luo, G Liu, P Poupart, Y Pan - Advances in Neural …, 2023 - proceedings.neurips.cc

Restricting the variance of a policy's return is a popular choice in risk-averse Reinforcement
Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional …

被引用次数：8 相关文章所有 6 个版本

[PDF] neurips.cc

Challenging common assumptions in convex reinforcement learning

M Mutti, R De Santi… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract The classic Reinforcement Learning (RL) formulation concerns the maximization of
a scalar reward function. More recently, convex RL has been introduced to extend the RL …

被引用次数：17 相关文章所有 9 个版本

A Review of Safe Reinforcement Learning: Methods, Theories and Applications

S Gu, L Yang, Y Du, G Chen, F Walter… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

Mean-variance policy iteration for risk-averse reinforcement learning

S Zhang, B Liu, S Whiteson - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a
discounted infinite horizon MDP optimizing the variance of a per-step reward random …

被引用次数：39 相关文章所有 9 个版本

[PDF] neurips.cc

Off-policy evaluation with deficient support using side information

N Felicioni, M Ferrari Dacrema… - Advances in …, 2022 - proceedings.neurips.cc

Abstract The Off-Policy Evaluation (OPE) problem consists in evaluating the performance of
new policies from the data collected by another one. OPE is crucial when evaluating a new …

被引用次数：8 相关文章所有 9 个版本

[PDF] researchgate.net

Cva hedging with reinforcement learning

R Daluiso, M Pinciroli, M Trapletti, E Vittori - Proceedings of the Fourth …, 2023 - dl.acm.org

This work considers the problem of a trader who must manage the Credit Valuation
Adjustment (CVA) of a derivative, defined as the risk-neutral expectation of losses incurred if …

被引用次数：6 相关文章所有 2 个版本