Robust q-learning algorithm for markov decision processes under wasserstein uncertainty

A Neufeld, J Sester - Automatica, 2024 - Elsevier
We present a novel Q-learning algorithm tailored to solve distributionally robust Markov
decision problems where the corresponding ambiguity set of transition probabilities for the …

Robust SGLD algorithm for solving non-convex distributionally robust optimisation problems

A Neufeld, MNC En, Y Zhang - arXiv preprint arXiv:2403.09532, 2024 - arxiv.org
In this paper we develop a Stochastic Gradient Langevin Dynamics (SGLD) algorithm
tailored for solving a certain class of non-convex distributionally robust optimisation …

Beyond discounted returns: Robust markov decision processes with average and blackwell optimality

J Grand-Clement, M Petrik, N Vieille - arXiv preprint arXiv:2312.03618, 2023 - arxiv.org
Robust Markov Decision Processes (RMDPs) are a widely used framework for sequential
decision-making under parameter uncertainty. RMDPs have been extensively studied when …

Policy Gradient for Robust Markov Decision Processes

Q Wang, S Xu, CP Ho, M Petrick - arXiv preprint arXiv:2410.22114, 2024 - arxiv.org
We develop a generic policy gradient method with the global optimality guarantee for robust
Markov Decision Processes (MDPs). While policy gradient methods are widely used for …

Time-Constrained Robust MDPs

A Zouitine, D Bertoin, P Clavier, M Geist… - arXiv preprint arXiv …, 2024 - arxiv.org
Robust reinforcement learning is essential for deploying reinforcement learning algorithms
in real-world scenarios where environmental uncertainty predominates. Traditional robust …

Bootstrapping Expectiles in Reinforcement Learning

P Clavier, E Rachelson, EL Pennec, M Geist - arXiv preprint arXiv …, 2024 - arxiv.org
Many classic Reinforcement Learning (RL) algorithms rely on a Bellman operator, which
involves an expectation over the next states, leading to the concept of bootstrapping. To …

Bounding the Difference between the Values of Robust and Non-Robust Markov Decision Problems

A Neufeld, J Sester - arXiv preprint arXiv:2308.05520, 2023 - arxiv.org
In this note we provide an upper bound for the difference between the value function of a
distributionally robust Markov decision problem and the value function of a non-robust …

Accelerated Policy Gradient for s-rectangular Robust MDPs with Large State Spaces

Z Chen, H Huang - Forty-first International Conference on Machine … - openreview.net
Robust Markov decision process (robust MDP) is an important machine learning framework
to make a reliable policy that is robust to environmental perturbation. Despite empirical …

Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

R Zhang, Y Hu, N Li - The Twelfth International Conference on Learning … - openreview.net
Robust Markov Decision Processes (MDPs) and risk-sensitive MDPs are both powerful tools
for making decisions in the presence of uncertainties. Previous efforts have aimed to …

Robust Reinforcement Learning with General Utility

Z Chen, Y Wen, Z Hu, H Huang - The Thirty-eighth Annual Conference on … - openreview.net
Reinforcement Learning (RL) problem with general utility is a powerful decision making
framework that covers standard RL with cumulative cost, exploration problems, and …