相关文章- 学术资源搜索

On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org

Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

被引用次数：13 相关文章所有 7 个版本

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

被引用次数：77 相关文章所有 7 个版本

[PDF] mlr.press

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

J Chen, N Jiang - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press

We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …

被引用次数：33 相关文章所有 7 个版本

[PDF] neurips.cc

Importance weighted actor-critic for optimal conservative offline reinforcement learning

H Zhu, P Rashidinejad, J Jiao - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new
practical algorithm for offline reinforcement learning (RL) in complex environments with …

被引用次数：10 相关文章所有 7 个版本

[PDF] neurips.cc

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc

We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

被引用次数：3 相关文章所有 7 个版本

[PDF] mlr.press

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

被引用次数：109 相关文章所有 6 个版本

[PDF] arxiv.org

Pessimistic nonlinear least-squares value iteration for offline reinforcement learning

Q Di, H Zhao, J He, Q Gu - arXiv preprint arXiv:2310.01380, 2023 - arxiv.org

Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based
on the data collected by a behavior policy, has attracted increasing attention in recent years …

被引用次数：4 相关文章所有 4 个版本

[PDF] neurips.cc

On gap-dependent bounds for offline reinforcement learning

X Wang, Q Cui, SS Du - Advances in Neural Information …, 2022 - proceedings.neurips.cc

This paper presents a systematic study on gap-dependent sample complexity in offline
reinforcement learning. Prior works showed when the density ratio between an optimal …

被引用次数：14 相关文章所有 7 个版本

[PDF] nsf.gov

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Offline reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed
dataset without active data collection. Based on the composition of the offline dataset, two …

被引用次数：17 相关文章所有 2 个版本

[PDF] neurips.cc

Oracle inequalities for model selection in offline reinforcement learning

JN Lee, G Tucker, O Nachum, B Dai… - Advances in Neural …, 2022 - proceedings.neurips.cc

In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …

被引用次数：14 相关文章所有 8 个版本

On instance-dependent bounds for offline reinforcement learning with linear function approximation

Towards instance-optimal offline reinforcement learning with pessimism

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

Importance weighted actor-critic for optimal conservative offline reinforcement learning

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

Offline reinforcement learning with realizability and single-policy concentrability

Pessimistic nonlinear least-squares value iteration for offline reinforcement learning

On gap-dependent bounds for offline reinforcement learning

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Oracle inequalities for model selection in offline reinforcement learning

相关搜索

高级搜索

引用