Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C Jin, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc
Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

Efficient model-free exploration in low-rank mdps

Z Mhammedi, A Block, DJ Foster… - Advances in Neural …, 2024 - proceedings.neurips.cc
A major challenge in reinforcement learning is to develop practical, sample-efficient
algorithms for exploration in high-dimensional domains where generalization and function …

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org
The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

On reward-free reinforcement learning with linear function approximation

R Wang, SS Du, L Yang… - Advances in neural …, 2020 - proceedings.neurips.cc
Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch
RL setting and the setting where there are many reward functions of interest. During the …

The power of exploiter: Provable multi-agent rl in large state spaces

C Jin, Q Liu, T Yu - International Conference on Machine …, 2022 - proceedings.mlr.press
Modern reinforcement learning (RL) commonly engages practical problems with large state
spaces, where function approximation must be deployed to approximate either the value …

Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective

DJ Foster, A Rakhlin, D Simchi-Levi, Y Xu - arXiv preprint arXiv …, 2020 - arxiv.org
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …

On function approximation in reinforcement learning: Optimism in the face of large state spaces

Z Yang, C Jin, Z Wang, M Wang, MI Jordan - arXiv preprint arXiv …, 2020 - arxiv.org
The classical theory of reinforcement learning (RL) has focused on tabular and linear
representations of value functions. Further progress hinges on combining RL with modern …

Towards general function approximation in zero-sum markov games

B Huang, JD Lee, Z Wang, Z Yang - arXiv preprint arXiv:2107.14702, 2021 - arxiv.org
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous
moves. The study focuses on the challenging settings where the value function or the model …

A provably efficient model-free posterior sampling method for episodic reinforcement learning

C Dann, M Mohri, T Zhang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Thompson Sampling is one of the most effective methods for contextual bandits and has
been generalized to posterior sampling for certain MDP settings. However, existing posterior …

Risk-sensitive reinforcement learning with function approximation: A debiasing approach

Y Fei, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study function approximation for episodic reinforcement learning with entropic risk
measure. We first propose an algorithm with linear function approximation. Compared to …