Policy-conditioned uncertainty sets for robust Markov decision processes

MA Abdullah, H Ren, HB Ammar, V Milenkovic… - arXiv preprint arXiv …, 2019 - arxiv.org

Reinforcement learning algorithms, though successful, tend to over-fit to training
environments hampering their application to the real-world. This paper proposes $\text …

被引用次数：95 相关文章所有 7 个版本

[PDF] neurips.cc

Robust -Divergence MDPs

CP Ho, M Petrik, W Wiesemann - Advances in Neural …, 2022 - proceedings.neurips.cc

In recent years, robust Markov decision processes (MDPs) have emerged as a prominent
modeling framework for dynamic decision problems affected by uncertainty. In contrast to …

被引用次数：20 相关文章所有 8 个版本

[PDF] neurips.cc

Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps

M Petrik, RH Russel - Advances in neural information …, 2019 - proceedings.neurips.cc

Abstract Robust MDPs (RMDPs) can be used to compute policies with provable worst-case
guarantees in reinforcement learning. The quality and robustness of an RMDP solution are …

被引用次数：69 相关文章所有 12 个版本

[PDF] arxiv.org

Distributionally robust reinforcement learning

E Smirnova, E Dohmatob, J Mary - arXiv preprint arXiv:1902.08708, 2019 - arxiv.org

Real-world applications require RL algorithms to act safely. During learning process, it is
likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the …

被引用次数：65 相关文章所有 3 个版本

[PDF] arxiv.org

Robust q-learning algorithm for markov decision processes under wasserstein uncertainty

A Neufeld, J Sester - Automatica, 2024 - Elsevier

We present a novel Q-learning algorithm tailored to solve distributionally robust Markov
decision problems where the corresponding ambiguity set of transition probabilities for the …

被引用次数：14 相关文章所有 4 个版本

[PDF] neurips.cc

Bayesian robust optimization for imitation learning

D Brown, S Niekum, M Petrik - Advances in Neural …, 2020 - proceedings.neurips.cc

One of the main challenges in imitation learning is determining what action an agent should
take when outside the state distribution of the demonstrations. Inverse reinforcement …

被引用次数：39 相关文章所有 7 个版本

[PDF] mlr.press

A bayesian approach to robust reinforcement learning

E Derman, D Mankowitz, T Mann… - Uncertainty in Artificial …, 2020 - proceedings.mlr.press

Abstract Robust Markov Decision Processes (RMDPs) intend to ensure robustness with
respect to changing or adversarial system behavior. In this framework, transitions are …

被引用次数：53 相关文章所有 5 个版本

[PDF] arxiv.org

Byzantine-resilient decentralized policy evaluation with linear function approximation

Z Wu, H Shen, T Chen, Q Ling - IEEE Transactions on Signal …, 2021 - ieeexplore.ieee.org

In this paper, we consider the policy evaluation problem in reinforcement learning with
agents on a decentralized and directed network. In order to evaluate the quality of a fixed …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

Reliable off-policy evaluation for reinforcement learning

J Wang, R Gao, H Zha - Operations Research, 2024 - pubsonline.informs.org

In a sequential decision-making problem, off-policy evaluation estimates the expected
cumulative reward of a target policy using logged trajectory data generated from a different …

被引用次数：18 相关文章所有 10 个版本

[PDF] mlr.press

Optimizing percentile criterion using robust MDPs

B Behzadian, RH Russel, M Petrik… - … Conference on Artificial …, 2021 - proceedings.mlr.press

We address the problem of computing reliable policies in reinforcement learning problems
with limited data. In particular, we compute policies that achieve good returns with high …

被引用次数：20 相关文章所有 7 个版本