A review of safe reinforcement learning: Methods, theory and applications

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …

[PDF][PDF] A comprehensive survey on safe reinforcement learning

J Garcıa, F Fernández - Journal of Machine Learning Research, 2015 - jmlr.org
Abstract Safe Reinforcement Learning can be defined as the process of learning policies
that maximize the expectation of the return in problems in which it is important to ensure …

A lyapunov-based approach to safe reinforcement learning

Y Chow, O Nachum… - Advances in neural …, 2018 - proceedings.neurips.cc
In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …

Reward constrained policy optimization

C Tessler, DJ Mankowitz, S Mannor - arXiv preprint arXiv:1805.11074, 2018 - arxiv.org
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …

Risk-constrained reinforcement learning with percentile risk criteria

Y Chow, M Ghavamzadeh, L Janson… - Journal of Machine …, 2018 - jmlr.org
In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …

Constrained reinforcement learning has zero duality gap

S Paternain, L Chamon… - Advances in Neural …, 2019 - proceedings.neurips.cc
Autonomous agents must often deal with conflicting requirements, such as completing tasks
using the least amount of time/energy, learning multiple tasks, or dealing with multiple …

Rudder: Return decomposition for delayed rewards

JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …

Algorithms for CVaR optimization in MDPs

Y Chow, M Ghavamzadeh - Advances in neural information …, 2014 - proceedings.neurips.cc
In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in costs in addition to minimizing a standard criterion …

Actor-critic algorithms for risk-sensitive MDPs

P La, M Ghavamzadeh - Advances in neural information …, 2013 - proceedings.neurips.cc
In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in rewards in addition to maximizing a standard criterion …

Text-based interactive recommendation via constraint-augmented reinforcement learning

R Zhang, T Yu, Y Shen, H Jin… - Advances in neural …, 2019 - proceedings.neurips.cc
Text-based interactive recommendation provides richer user preferences and has
demonstrated advantages over traditional interactive recommender systems. However …