On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc
We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

Corruption Robust Offline Reinforcement Learning with Human Feedback

D Mandal, A Nika, P Kamalaruban, A Singla… - arXiv preprint arXiv …, 2024 - arxiv.org
We study data corruption robustness for reinforcement learning with human feedback
(RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with …

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

K Hong, A Tewari - arXiv preprint arXiv:2402.04493, 2024 - arxiv.org
Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected
cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general …

The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

N Golowich, A Moitra - arXiv preprint arXiv:2406.11686, 2024 - arxiv.org
In this paper, we study the offline RL problem with linear function approximation. Our main
structural assumption is that the MDP has low inherent Bellman error, which stipulates that …

Offline RL via Feature-Occupancy Gradient Ascent

G Neu, N Okolo - arXiv preprint arXiv:2405.13755, 2024 - arxiv.org
We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

K Hong, A Tewari - Forty-first International Conference on Machine … - openreview.net
We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon
discounted setting which aims to learn a policy that maximizes the expected discounted …

[PDF][PDF] Offline Reinforcement Learning via Inverse Optimization

I Dimanidis, T Ok, PM Esfahani - 2024 - dcsc.tudelft.nl
Inspired by the recent successes of Inverse Optimization (IO) across various application
domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for …

Reinforcement learning under general function approximation and novel interaction settings

J Chen - 2023 - ideals.illinois.edu
Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …