Adversarial training for high-stakes reliability

D Ziegler, S Nix, L Chan, T Bauman… - Advances in …, 2022 - proceedings.neurips.cc
In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …

Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations

G Qiao, G Liu, P Poupart, Z Xu - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …

[图书][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com
The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

Fast bellman updates for wasserstein distributionally robust MDPs

Z Yu, L Dai, S Xu, S Gao, CP Ho - Advances in Neural …, 2024 - proceedings.neurips.cc
Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Scalable bayesian inverse reinforcement learning

AJ Chan, M van der Schaar - arXiv preprint arXiv:2102.06483, 2021 - arxiv.org
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …

Stap: Sequencing task-agnostic policies

C Agia, T Migimatsu, J Wu… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …

Entropic risk optimization in discounted MDPs

JL Hau, M Petrik… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …

Policy gradient bayesian robust optimization for imitation learning

Z Javed, DS Brown, S Sharma, J Zhu… - International …, 2021 - proceedings.mlr.press
The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …

On the convex formulations of robust Markov decision processes

J Grand-Clément, M Petrik - Mathematics of Operations …, 2024 - pubsonline.informs.org
Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many of the …