Policy gradients with variance related risk criteria

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

被引用次数：291 相关文章所有 2 个版本

[PDF] jmlr.org

[PDF][PDF] A comprehensive survey on safe reinforcement learning

J Garcıa, F Fernández - Journal of Machine Learning Research, 2015 - jmlr.org

Abstract Safe Reinforcement Learning can be defined as the process of learning policies
that maximize the expectation of the return in problems in which it is important to ensure …

被引用次数：2073 相关文章所有 5 个版本

[PDF] neurips.cc

A lyapunov-based approach to safe reinforcement learning

Y Chow, O Nachum… - Advances in neural …, 2018 - proceedings.neurips.cc

In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …

被引用次数：629 相关文章所有 12 个版本

[PDF] arxiv.org

Reward constrained policy optimization

C Tessler, DJ Mankowitz, S Mannor - arXiv preprint arXiv:1805.11074, 2018 - arxiv.org

Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …

被引用次数：620 相关文章所有 4 个版本

[PDF] jmlr.org

Risk-constrained reinforcement learning with percentile risk criteria

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org

In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

被引用次数：624 相关文章所有 12 个版本

[PDF] neurips.cc

Constrained reinforcement learning has zero duality gap

S Paternain, L Chamon… - Advances in Neural …, 2019 - proceedings.neurips.cc

Autonomous agents must often deal with conflicting requirements, such as completing tasks
using the least amount of time/energy, learning multiple tasks, or dealing with multiple …

被引用次数：213 相关文章所有 8 个版本

[PDF] neurips.cc

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

被引用次数：269 相关文章所有 9 个版本

[PDF] neurips.cc

Algorithms for CVaR optimization in MDPs

Y Chow, M Ghavamzadeh - Advances in neural information …, 2014 - proceedings.neurips.cc

In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in costs in addition to minimizing a standard criterion …

被引用次数：402 相关文章所有 11 个版本

[PDF] neurips.cc

Actor-critic algorithms for risk-sensitive MDPs

P La, M Ghavamzadeh - Advances in neural information …, 2013 - proceedings.neurips.cc

In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in rewards in addition to maximizing a standard criterion …

被引用次数：335 相关文章所有 28 个版本

[PDF] neurips.cc

Text-based interactive recommendation via constraint-augmented reinforcement learning

R Zhang, T Yu, Y Shen, H Jin… - Advances in neural …, 2019 - proceedings.neurips.cc

Text-based interactive recommendation provides richer user preferences and has
demonstrated advantages over traditional interactive recommender systems. However …

被引用次数：148 相关文章所有 14 个版本