Bayesian robust optimization for imitation learning

D Ziegler, S Nix, L Chan, T Bauman… - Advances in …, 2022 - proceedings.neurips.cc

In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …

被引用次数：52 相关文章所有 6 个版本

[PDF] neurips.cc

Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations

G Qiao, G Liu, P Poupart, Z Xu - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …

被引用次数：15 相关文章所有 4 个版本

[PDF] enseeiht.fr

[图书][B] Distributional reinforcement learning

MG Bellemare, W Dabney, M Rowland - 2023 - books.google.com

The first comprehensive guide to distributional reinforcement learning, providing a new
mathematical formalism for thinking about decisions from a probabilistic perspective …

被引用次数：157 相关文章所有 9 个版本

[PDF] neurips.cc

Fast bellman updates for wasserstein distributionally robust MDPs

Z Yu, L Dai, S Xu, S Gao, CP Ho - Advances in Neural …, 2024 - proceedings.neurips.cc

Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …

被引用次数：9 相关文章所有 3 个版本

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

被引用次数：56 相关文章所有 11 个版本

[PDF] arxiv.org

Scalable bayesian inverse reinforcement learning

AJ Chan, M van der Schaar - arXiv preprint arXiv:2102.06483, 2021 - arxiv.org

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …

被引用次数：76 相关文章所有 5 个版本

[PDF] arxiv.org

Stap: Sequencing task-agnostic policies

C Agia, T Migimatsu, J Wu… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …

被引用次数：21 相关文章所有 5 个版本

[PDF] mlr.press

Entropic risk optimization in discounted MDPs

JL Hau, M Petrik… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …

被引用次数：15 相关文章所有 6 个版本

[PDF] mlr.press

Policy gradient bayesian robust optimization for imitation learning

Z Javed, DS Brown, S Sharma, J Zhu… - International …, 2021 - proceedings.mlr.press

The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …

被引用次数：25 相关文章所有 6 个版本

[PDF] arxiv.org

On the convex formulations of robust Markov decision processes

J Grand-Clément, M Petrik - Mathematics of Operations …, 2024 - pubsonline.informs.org

Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many of the …

被引用次数：13 相关文章所有 2 个版本