Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arXiv preprint arXiv:2203.05804, 2022 - arxiv.org
Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W Xiong, H Zhong, C Shi, C Shen, L Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets

H Zhong, W Xiong, J Tan, L Wang… - International …, 2022 - proceedings.mlr.press
We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the
goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset …

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

Reinforcement learning in low-rank mdps with density features

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press
MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage

J Blanchet, M Lu, T Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We study distributionally robust offline reinforcement learning (RL), which seeks to find an
optimal robust policy purely from an offline dataset that can perform well in perturbed …

When are offline two-player zero-sum Markov games solvable?

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
We study what dataset assumption permits solving offline two-player zero-sum Markov
games. In stark contrast to the offline single-agent Markov decision process, we show that …