Non-asymptotic gap-dependent regret bounds for tabular mdps

A Zhang, ZC Lipton, M Li, AJ Smola - arXiv preprint arXiv:2106.11342, 2021 - arxiv.org

This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

被引用次数：1070 相关文章所有 9 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：217 相关文章所有 7 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

被引用次数：49 相关文章所有 7 个版本

[PDF] mlr.press

Learning near optimal policies with low inherent bellman error

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

被引用次数：229 相关文章所有 5 个版本

[PDF] mlr.press

Reward-free exploration for reinforcement learning

C Jin, A Krishnamurthy… - … on Machine Learning, 2020 - proceedings.mlr.press

Exploration is widely regarded as one of the most challenging aspects of reinforcement
learning (RL), with many naive approaches succumbing to exponential sample complexity …

被引用次数：236 相关文章所有 6 个版本

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

被引用次数：39 相关文章所有 6 个版本

[PDF] mlr.press

Guarantees for epsilon-greedy reinforcement learning with function approximation

C Dann, Y Mansour, M Mohri… - International …, 2022 - proceedings.mlr.press

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …

被引用次数：45 相关文章所有 6 个版本

[PDF] neurips.cc

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

被引用次数：165 相关文章所有 8 个版本

[PDF] mlr.press

Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2021 - proceedings.mlr.press

Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …

被引用次数：113 相关文章所有 4 个版本

[PDF] arxiv.org

Optimism in reinforcement learning with generalized linear function approximation

Y Wang, R Wang, SS Du, A Krishnamurthy - arXiv preprint arXiv …, 2019 - arxiv.org

We design a new provably efficient algorithm for episodic reinforcement learning with
generalized linear function approximation. We analyze the algorithm under a new …

被引用次数：165 相关文章所有 3 个版本