On instance-dependent bounds for offline reinforcement learning with linear function approximation

T Nguyen-Tang, M Yin, S Gupta, S Venkatesh… - Proceedings of the …, 2023 - ojs.aaai.org
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

J Chen, N Jiang - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press
We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …

Importance weighted actor-critic for optimal conservative offline reinforcement learning

H Zhu, P Rashidinejad, J Jiao - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new
practical algorithm for offline reinforcement learning (RL) in complex environments with …

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

Pessimistic nonlinear least-squares value iteration for offline reinforcement learning

Q Di, H Zhao, J He, Q Gu - arXiv preprint arXiv:2310.01380, 2023 - arxiv.org
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based
on the data collected by a behavior policy, has attracted increasing attention in recent years …

On gap-dependent bounds for offline reinforcement learning

X Wang, Q Cui, SS Du - Advances in Neural Information …, 2022 - proceedings.neurips.cc
This paper presents a systematic study on gap-dependent sample complexity in offline
reinforcement learning. Prior works showed when the density ratio between an optimal …

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Offline reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed
dataset without active data collection. Based on the composition of the offline dataset, two …

Oracle inequalities for model selection in offline reinforcement learning

JN Lee, G Tucker, O Nachum, B Dai… - Advances in Neural …, 2022 - proceedings.neurips.cc
In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …