Worst-case regret bounds for exploration via randomized value functions

A Zanette, A Lazaric, M Kochenderfer… - International …, 2020 - proceedings.mlr.press

We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …

被引用次数：256 相关文章所有 5 个版本

[PDF] arxiv.org

Conservative safety critics for exploration

H Bharadhwaj, A Kumar, N Rhinehart, S Levine… - arXiv preprint arXiv …, 2020 - arxiv.org

Safe exploration presents a major challenge in reinforcement learning (RL): when active
data collection requires deploying partially trained policies, we must ensure that these …

被引用次数：153 相关文章所有 3 个版本

[PDF] jmlr.org

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org

We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

被引用次数：359 相关文章所有 9 个版本

[PDF] neurips.cc

Pc-pg: Policy cover directed exploration for provable policy gradient learning

A Agarwal, M Henaff, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …

被引用次数：146 相关文章所有 11 个版本

[PDF] mlr.press

Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon

Z Zhang, X Ji, S Du - Conference on Learning Theory, 2021 - proceedings.mlr.press

Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …

被引用次数：125 相关文章所有 4 个版本

[PDF] mlr.press

Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium

Q Xie, Y Chen, Z Wang, Z Yang - Conference on learning …, 2020 - proceedings.mlr.press

In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …

被引用次数：161 相关文章所有 6 个版本

[PDF] mlr.press

Frequentist regret bounds for randomized least-squares value iteration

A Zanette, D Brandfonbrener… - International …, 2020 - proceedings.mlr.press

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning
(RL). When the state space is large or continuous, traditional tabular approaches are …

被引用次数：153 相关文章所有 4 个版本

[PDF] neurips.cc

Model-based rl with optimistic posterior sampling: Structural conditions and sample complexity

A Agarwal, T Zhang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose a general framework to design posterior sampling methods for model-based
RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger …

被引用次数：37 相关文章所有 9 个版本

[PDF] mlr.press

Corruption-robust exploration in episodic reinforcement learning

T Lykouris, M Simchowitz… - … on Learning Theory, 2021 - proceedings.mlr.press

We initiate the study of episodic reinforcement learning under adversarial corruptions in both
the rewards and the transition probabilities of the underlying system extending recent results …

被引用次数：131 相关文章所有 5 个版本

[PDF] arxiv.org

On function approximation in reinforcement learning: Optimism in the face of large state spaces

Z Yang, C Jin, Z Wang, M Wang, MI Jordan - arXiv preprint arXiv …, 2020 - arxiv.org

The classical theory of reinforcement learning (RL) has focused on tabular and linear
representations of value functions. Further progress hinges on combining RL with modern …

被引用次数：88 相关文章所有 6 个版本