On instance-dependent bounds for offline reinforcement learning with linear function approximation
Sample-efficient offline reinforcement learning (RL) with linear function approximation has
been studied extensively recently. Much of the prior work has yielded instance-independent …
been studied extensively recently. Much of the prior work has yielded instance-independent …
Towards instance-optimal offline reinforcement learning with pessimism
We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …
Offline reinforcement learning under value and density-ratio realizability: the power of gaps
We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …
Importance weighted actor-critic for optimal conservative offline reinforcement learning
Abstract We propose A-Crab (Actor-Critic Regularized by Average Bellman error), a new
practical algorithm for offline reinforcement learning (RL) in complex environments with …
practical algorithm for offline reinforcement learning (RL) in complex environments with …
On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond
T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …
sequential decision-making, a problem that is popularly known as offline reinforcement …
Offline reinforcement learning with realizability and single-policy concentrability
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …
assumptions on both the function classes (eg, Bellman-completeness) and the data …
Pessimistic nonlinear least-squares value iteration for offline reinforcement learning
Offline reinforcement learning (RL), where the agent aims to learn the optimal policy based
on the data collected by a behavior policy, has attracted increasing attention in recent years …
on the data collected by a behavior policy, has attracted increasing attention in recent years …
On gap-dependent bounds for offline reinforcement learning
This paper presents a systematic study on gap-dependent sample complexity in offline
reinforcement learning. Prior works showed when the density ratio between an optimal …
reinforcement learning. Prior works showed when the density ratio between an optimal …
Bridging offline reinforcement learning and imitation learning: A tale of pessimism
Offline reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed
dataset without active data collection. Based on the composition of the offline dataset, two …
dataset without active data collection. Based on the composition of the offline dataset, two …
Oracle inequalities for model selection in offline reinforcement learning
In offline reinforcement learning (RL), a learner leverages prior logged data to learn a good
policy without interacting with the environment. A major challenge in applying such methods …
policy without interacting with the environment. A major challenge in applying such methods …