- 学术资源搜索

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：97 相关文章所有 10 个版本

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

被引用次数：28 相关文章所有 10 个版本

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

被引用次数：80 相关文章所有 8 个版本

[PDF] mlr.press

Federated reinforcement learning: Linear speedup under markovian sampling

S Khodadadian, P Sharma, G Joshi… - International …, 2022 - proceedings.mlr.press

Since reinforcement learning algorithms are notoriously data-intensive, the task of sampling
observations from the environment is usually split across multiple agents. However …

被引用次数：62 相关文章所有 7 个版本

[PDF] neurips.cc

Breaking the sample size barrier in model-based reinforcement learning with a generative model

G Li, Y Wei, Y Chi, Y Gu… - Advances in neural …, 2020 - proceedings.neurips.cc

We investigate the sample efficiency of reinforcement learning in a $\gamma $-discounted
infinite-horizon Markov decision process (MDP) with state space S and action space A …

被引用次数：135 相关文章所有 10 个版本

[HTML] informs.org

Is Q-learning minimax optimal? a tight sample complexity analysis

G Li, C Cai, Y Chen, Y Wei, Y Chi - Operations Research, 2024 - pubsonline.informs.org

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP)
in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the …

被引用次数：85 相关文章所有 11 个版本

[PDF] mlr.press

The blessing of heterogeneity in federated q-learning: Linear speedup and beyond

J Woo, G Joshi, Y Chi - International Conference on …, 2023 - proceedings.mlr.press

In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …

被引用次数：18 相关文章所有 9 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：53 相关文章所有 8 个版本

[PDF] mlr.press

A finite sample complexity bound for distributionally robust q-learning

S Wang, N Si, J Blanchet… - … Conference on Artificial …, 2023 - proceedings.mlr.press

We consider a reinforcement learning setting in which the deployment environment is
different from the training environment. Applying a robust Markov decision processes …

被引用次数：22 相关文章所有 3 个版本

[PDF] neurips.cc

Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning

G Li, L Shi, Y Chen, Y Gu, Y Chi - Advances in Neural …, 2021 - proceedings.neurips.cc

Achieving sample efficiency in online episodic reinforcement learning (RL) requires
optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …

被引用次数：51 相关文章所有 13 个版本