Wasserstein robust reinforcement learning
Reinforcement learning algorithms, though successful, tend to over-fit to training
environments hampering their application to the real-world. This paper proposes $\text …
environments hampering their application to the real-world. This paper proposes $\text …
Robust -Divergence MDPs
In recent years, robust Markov decision processes (MDPs) have emerged as a prominent
modeling framework for dynamic decision problems affected by uncertainty. In contrast to …
modeling framework for dynamic decision problems affected by uncertainty. In contrast to …
Beyond confidence regions: Tight bayesian ambiguity sets for robust mdps
Abstract Robust MDPs (RMDPs) can be used to compute policies with provable worst-case
guarantees in reinforcement learning. The quality and robustness of an RMDP solution are …
guarantees in reinforcement learning. The quality and robustness of an RMDP solution are …
Distributionally robust reinforcement learning
E Smirnova, E Dohmatob, J Mary - arXiv preprint arXiv:1902.08708, 2019 - arxiv.org
Real-world applications require RL algorithms to act safely. During learning process, it is
likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the …
likely that the agent executes sub-optimal actions that may lead to unsafe/poor states of the …
Robust q-learning algorithm for markov decision processes under wasserstein uncertainty
We present a novel Q-learning algorithm tailored to solve distributionally robust Markov
decision problems where the corresponding ambiguity set of transition probabilities for the …
decision problems where the corresponding ambiguity set of transition probabilities for the …
Bayesian robust optimization for imitation learning
One of the main challenges in imitation learning is determining what action an agent should
take when outside the state distribution of the demonstrations. Inverse reinforcement …
take when outside the state distribution of the demonstrations. Inverse reinforcement …
A bayesian approach to robust reinforcement learning
Abstract Robust Markov Decision Processes (RMDPs) intend to ensure robustness with
respect to changing or adversarial system behavior. In this framework, transitions are …
respect to changing or adversarial system behavior. In this framework, transitions are …
Byzantine-resilient decentralized policy evaluation with linear function approximation
In this paper, we consider the policy evaluation problem in reinforcement learning with
agents on a decentralized and directed network. In order to evaluate the quality of a fixed …
agents on a decentralized and directed network. In order to evaluate the quality of a fixed …
Reliable off-policy evaluation for reinforcement learning
In a sequential decision-making problem, off-policy evaluation estimates the expected
cumulative reward of a target policy using logged trajectory data generated from a different …
cumulative reward of a target policy using logged trajectory data generated from a different …
Optimizing percentile criterion using robust MDPs
We address the problem of computing reliable policies in reinforcement learning problems
with limited data. In particular, we compute policies that achieve good returns with high …
with limited data. In particular, we compute policies that achieve good returns with high …