A review of safe reinforcement learning: Methods, theory and applications
Reinforcement Learning (RL) has achieved tremendous success in many complex decision-
making tasks. However, safety concerns are raised during deploying RL in real-world …
making tasks. However, safety concerns are raised during deploying RL in real-world …
[PDF][PDF] A comprehensive survey on safe reinforcement learning
J Garcıa, F Fernández - Journal of Machine Learning Research, 2015 - jmlr.org
Abstract Safe Reinforcement Learning can be defined as the process of learning policies
that maximize the expectation of the return in problems in which it is important to ensure …
that maximize the expectation of the return in problems in which it is important to ensure …
A lyapunov-based approach to safe reinforcement learning
In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …
objective function, an agent must concurrently avoid violating a number of constraints. In …
Reward constrained policy optimization
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …
maximize the accumulated reward, it often learns to exploit loopholes and misspecifications …
Risk-constrained reinforcement learning with percentile risk criteria
In many sequential decision-making problems one is interested in minimizing an expected
cumulative cost while taking into account risk, ie, increased awareness of events of small …
cumulative cost while taking into account risk, ie, increased awareness of events of small …
Constrained reinforcement learning has zero duality gap
S Paternain, L Chamon… - Advances in Neural …, 2019 - proceedings.neurips.cc
Autonomous agents must often deal with conflicting requirements, such as completing tasks
using the least amount of time/energy, learning multiple tasks, or dealing with multiple …
using the least amount of time/energy, learning multiple tasks, or dealing with multiple …
Rudder: Return decomposition for delayed rewards
JA Arjona-Medina, M Gillhofer… - Advances in …, 2019 - proceedings.neurips.cc
We propose RUDDER, a novel reinforcement learning approach for delayed rewards in
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
finite Markov decision processes (MDPs). In MDPs the Q-values are equal to the expected …
Algorithms for CVaR optimization in MDPs
Y Chow, M Ghavamzadeh - Advances in neural information …, 2014 - proceedings.neurips.cc
In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in costs in addition to minimizing a standard criterion …
some measure of variability in costs in addition to minimizing a standard criterion …
Actor-critic algorithms for risk-sensitive MDPs
P La, M Ghavamzadeh - Advances in neural information …, 2013 - proceedings.neurips.cc
In many sequential decision-making problems we may want to manage risk by minimizing
some measure of variability in rewards in addition to maximizing a standard criterion …
some measure of variability in rewards in addition to maximizing a standard criterion …
Text-based interactive recommendation via constraint-augmented reinforcement learning
Text-based interactive recommendation provides richer user preferences and has
demonstrated advantages over traditional interactive recommender systems. However …
demonstrated advantages over traditional interactive recommender systems. However …