Provably efficient reinforcement learning with general value function approximation

C Jin, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

被引用次数：263 相关文章所有 11 个版本

[PDF] neurips.cc

Efficient model-free exploration in low-rank mdps

Z Mhammedi, A Block, DJ Foster… - Advances in Neural …, 2024 - proceedings.neurips.cc

A major challenge in reinforcement learning is to develop practical, sample-efficient
algorithms for exploration in high-dimensional domains where generalization and function …

被引用次数：16 相关文章所有 5 个版本

[PDF] jmlr.org

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org

The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

被引用次数：95 相关文章所有 5 个版本

[PDF] neurips.cc

On reward-free reinforcement learning with linear function approximation

R Wang, SS Du, L Yang… - Advances in neural …, 2020 - proceedings.neurips.cc

Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch
RL setting and the setting where there are many reward functions of interest. During the …

被引用次数：130 相关文章所有 6 个版本

[PDF] mlr.press

The power of exploiter: Provable multi-agent rl in large state spaces

C Jin, Q Liu, T Yu - International Conference on Machine …, 2022 - proceedings.mlr.press

Modern reinforcement learning (RL) commonly engages practical problems with large state
spaces, where function approximation must be deployed to approximate either the value …

被引用次数：71 相关文章所有 7 个版本

[PDF] arxiv.org

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

DJ Foster, A Rakhlin, D Simchi-Levi, Y Xu - arXiv preprint arXiv …, 2020 - arxiv.org

In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …

被引用次数：100 相关文章所有 4 个版本

[PDF] arxiv.org

On function approximation in reinforcement learning: Optimism in the face of large state spaces

Z Yang, C Jin, Z Wang, M Wang, MI Jordan - arXiv preprint arXiv …, 2020 - arxiv.org

The classical theory of reinforcement learning (RL) has focused on tabular and linear
representations of value functions. Further progress hinges on combining RL with modern …

被引用次数：88 相关文章所有 6 个版本

[PDF] arxiv.org

Towards general function approximation in zero-sum markov games

B Huang, JD Lee, Z Wang, Z Yang - arXiv preprint arXiv:2107.14702, 2021 - arxiv.org

This paper considers two-player zero-sum finite-horizon Markov games with simultaneous
moves. The study focuses on the challenging settings where the value function or the model …

被引用次数：59 相关文章所有 5 个版本

[PDF] neurips.cc

A provably efficient model-free posterior sampling method for episodic reinforcement learning

C Dann, M Mohri, T Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc

Thompson Sampling is one of the most effective methods for contextual bandits and has
been generalized to posterior sampling for certain MDP settings. However, existing posterior …

被引用次数：43 相关文章所有 9 个版本

[PDF] mlr.press

Risk-sensitive reinforcement learning with function approximation: A debiasing approach

Y Fei, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study function approximation for episodic reinforcement learning with entropic risk
measure. We first propose an algorithm with linear function approximation. Compared to …

被引用次数：49 相关文章所有 4 个版本