Q-learning with logarithmic regret

A Zhang, ZC Lipton, M Li, AJ Smola - arXiv preprint arXiv:2106.11342, 2021 - arxiv.org

This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

被引用次数：1197 相关文章所有 9 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：103 相关文章所有 10 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：239 相关文章所有 7 个版本

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

被引用次数：37 相关文章所有 10 个版本

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

被引用次数：47 相关文章所有 6 个版本

[PDF] neurips.cc

Unpacking reward shaping: Understanding the benefits of reward engineering on sample complexity

A Gupta, A Pacchiano, Y Zhai… - Advances in Neural …, 2022 - proceedings.neurips.cc

The success of reinforcement learning in a variety of challenging sequential decision-
making problems has been much discussed, but often ignored in this discussion is the …

被引用次数：62 相关文章所有 8 个版本

[PDF] mlr.press

Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2021 - proceedings.mlr.press

Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …

被引用次数：125 相关文章所有 4 个版本

[PDF] mlr.press

The blessing of heterogeneity in federated q-learning: Linear speedup and beyond

J Woo, G Joshi, Y Chi - International Conference on …, 2023 - proceedings.mlr.press

In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function
by periodically aggregating local Q-estimates trained on local data alone. Focusing on …

被引用次数：22 相关文章所有 9 个版本

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

被引用次数：57 相关文章所有 8 个版本

[PDF] mlr.press

Revisiting the linear-programming framework for offline rl with general function approximation

AE Ozdaglar, S Pattathil, J Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-
making using a pre-collected dataset, without further interaction with the environment …

被引用次数：27 相关文章所有 6 个版本