Adversarial training for high-stakes reliability
In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …
Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations
Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …
[图书][B] Distributional reinforcement learning
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …
mathematical formalism for thinking about decisions from a probabilistic perspective …
Fast bellman updates for wasserstein distributionally robust MDPs
Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Scalable bayesian inverse reinforcement learning
AJ Chan, M van der Schaar - arXiv preprint arXiv:2102.06483, 2021 - arxiv.org
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …
inverse reinforcement learning problem. Unfortunately current methods generally do not …
Stap: Sequencing task-agnostic policies
Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …
of learned skills for downstream manipulation tasks. However, naively executing these skills …
Entropic risk optimization in discounted MDPs
Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …
high returns with low variability, but these MDPs are often difficult to solve. Only a few …
Policy gradient bayesian robust optimization for imitation learning
The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …
focus on learning rewards from human feedback, such as demonstrations. However, there …
On the convex formulations of robust Markov decision processes
J Grand-Clément, M Petrik - Mathematics of Operations …, 2024 - pubsonline.informs.org
Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many of the …
optimization in uncertain environments and have been studied extensively. Many of the …