Towards instance-optimal offline reinforcement learning with pessimism

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：97 相关文章所有 10 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：80 相关文章所有 8 个版本

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

被引用次数：10 相关文章所有 7 个版本

[PDF] arxiv.org

Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arXiv preprint arXiv:2203.05804, 2022 - arxiv.org

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

被引用次数：75 相关文章所有 7 个版本

[PDF] arxiv.org

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W Xiong, H Zhong, C Shi, C Shen, L Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

被引用次数：44 相关文章所有 5 个版本

[PDF] mlr.press

Pessimistic minimax value iteration: Provably efficient equilibrium learning from offline datasets

H Zhong, W Xiong, J Tan, L Wang… - International …, 2022 - proceedings.mlr.press

We study episodic two-player zero-sum Markov games (MGs) in the offline setting, where the
goal is to find an approximate Nash equilibrium (NE) policy pair based on a dataset …

被引用次数：43 相关文章所有 6 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：53 相关文章所有 8 个版本

[PDF] mlr.press

Reinforcement learning in low-rank mdps with density features

A Huang, J Chen, N Jiang - International Conference on …, 2023 - proceedings.mlr.press

MDPs with low-rank transitions—that is, the transition matrix can be factored into the product
of two matrices, left and right—is a highly representative structure that enables tractable …

被引用次数：17 相关文章所有 8 个版本

[PDF] neurips.cc

Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage

J Blanchet, M Lu, T Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study distributionally robust offline reinforcement learning (RL), which seeks to find an
optimal robust policy purely from an offline dataset that can perform well in perturbed …

被引用次数：22 相关文章所有 7 个版本

[PDF] neurips.cc

When are offline two-player zero-sum Markov games solvable?

Q Cui, SS Du - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc

We study what dataset assumption permits solving offline two-player zero-sum Markov
games. In stark contrast to the offline single-agent Markov decision process, we show that …

被引用次数：51 相关文章所有 9 个版本