Wasserstein robust reinforcement learning

MA Abdullah, H Ren, HB Ammar, V Milenkovic… - arXiv preprint arXiv …, 2019 - arxiv.org
Reinforcement learning algorithms, though successful, tend to over-fit to training
environments hampering their application to the real-world. This paper proposes $\text …

Robust -Divergence MDPs

CP Ho, M Petrik, W Wiesemann - Advances in Neural …, 2022 - proceedings.neurips.cc
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent
modeling framework for dynamic decision problems affected by uncertainty. In contrast to …

Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps

M Petrik, RH Russel - Advances in neural information …, 2019 - proceedings.neurips.cc
Abstract Robust MDPs (RMDPs) can be used to compute policies with provable worst-case
guarantees in reinforcement learning. The quality and robustness of an RMDP solution are …

Distributionally robust reinforcement learning

E Smirnova, E Dohmatob, J Mary - arXiv preprint arXiv:1902.08708, 2019 - arxiv.org
Real-world applications require RL algorithms to act safely. During learning process, it is
likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the …

Robust q-learning algorithm for markov decision processes under wasserstein uncertainty

A Neufeld, J Sester - Automatica, 2024 - Elsevier
We present a novel Q-learning algorithm tailored to solve distributionally robust Markov
decision problems where the corresponding ambiguity set of transition probabilities for the …

Bayesian robust optimization for imitation learning

D Brown, S Niekum, M Petrik - Advances in Neural …, 2020 - proceedings.neurips.cc
One of the main challenges in imitation learning is determining what action an agent should
take when outside the state distribution of the demonstrations. Inverse reinforcement …

A bayesian approach to robust reinforcement learning

E Derman, D Mankowitz, T Mann… - Uncertainty in Artificial …, 2020 - proceedings.mlr.press
Abstract Robust Markov Decision Processes (RMDPs) intend to ensure robustness with
respect to changing or adversarial system behavior. In this framework, transitions are …

Byzantine-resilient decentralized policy evaluation with linear function approximation

Z Wu, H Shen, T Chen, Q Ling - IEEE Transactions on Signal …, 2021 - ieeexplore.ieee.org
In this paper, we consider the policy evaluation problem in reinforcement learning with
agents on a decentralized and directed network. In order to evaluate the quality of a fixed …

Reliable off-policy evaluation for reinforcement learning

J Wang, R Gao, H Zha - Operations Research, 2024 - pubsonline.informs.org
In a sequential decision-making problem, off-policy evaluation estimates the expected
cumulative reward of a target policy using logged trajectory data generated from a different …

Optimizing percentile criterion using robust MDPs

B Behzadian, RH Russel, M Petrik… - … Conference on Artificial …, 2021 - proceedings.mlr.press
We address the problem of computing reliable policies in reinforcement learning problems
with limited data. In particular, we compute policies that achieve good returns with high …