Reinforcement learning in linear mdps: Constant regret and representation selection

X Zhang, Y Song, M Uehara, M Wang… - International …, 2022 - proceedings.mlr.press

We present BRIEE, an algorithm for efficient reinforcement learning in Markov Decision
Processes with block-structured dynamics (ie, Block MDPs), where rich observations are …

被引用次数：70 相关文章所有 4 个版本

[PDF] jmlr.org

Model-free representation learning and exploration in low-rank mdps

A Modi, J Chen, A Krishnamurthy, N Jiang… - Journal of Machine …, 2024 - jmlr.org

The low-rank MDP has emerged as an important model for studying representation learning
and exploration in reinforcement learning. With a known representation, several model-free …

被引用次数：95 相关文章所有 5 个版本

[PDF] mlr.press

Revisiting the linear-programming framework for offline rl with general function approximation

AE Ozdaglar, S Pattathil, J Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline reinforcement learning (RL) aims to find an optimal policy for sequential decision-
making using a pre-collected dataset, without further interaction with the environment …

被引用次数：27 相关文章所有 6 个版本

[PDF] mlr.press

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

J Chen, N Jiang - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press

We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …

被引用次数：43 相关文章所有 7 个版本

[PDF] academia.edu

[PDF][PDF] Structure in reinforcement learning: A survey and open problems

A Mohan, A Zhang, M Lindauer - arXiv preprint arXiv:2306.16021, 2023 - academia.edu

Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural
Networks (DNNs) for function approximation, has demonstrated considerable success in …

被引用次数：17 相关文章所有 2 个版本

[PDF] aaai.org

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org

Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Context-lumpable stochastic bandits

CW Lee, Q Liu, Y Abbasi Yadkori… - Advances in …, 2024 - proceedings.neurips.cc

We consider a contextual bandit problem with $ S $ contexts and $ K $ actions. In each
round $ t= 1, 2,\dots $ the learnerobserves a random context and chooses an action based …

被引用次数：2 相关文章所有 7 个版本

[PDF] neurips.cc

Provable General Function Class Representation Learning in Multitask Bandits and MDP

R Lu, A Zhao, SS Du, G Huang - Advances in Neural …, 2022 - proceedings.neurips.cc

While multitask representation learning has become a popular approach in reinforcement
learning (RL) to boost the sample efficiency, the theoretical understanding of why and how it …

被引用次数：6 相关文章所有 8 个版本

[PDF] neurips.cc

Scalable representation learning in linear contextual bandits with constant regret guarantees

A Tirinzoni, M Papini, A Touati… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the problem of representation learning in stochastic contextual linear bandits.
While the primary concern in this domain is usually to find\textit {realizable} representations …

被引用次数：4 相关文章所有 8 个版本

[PDF] jair.org Full View

Structure in Deep Reinforcement Learning: A Survey and Open Problems

A Mohan, A Zhang, M Lindauer - Journal of Artificial Intelligence Research, 2024 - jair.org

Reinforcement Learning (RL), bolstered by the expressive capabilities of Deep Neural
Networks (DNNs) for function approximation, has demonstrated considerable success in …

被引用次数：8 相关文章所有 5 个版本