Offline primal-dual reinforcement learning for linear mdps

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc

We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

被引用次数：5 相关文章所有 7 个版本

[PDF] arxiv.org

Corruption Robust Offline Reinforcement Learning with Human Feedback

D Mandal, A Nika, P Kamalaruban, A Singla… - arXiv preprint arXiv …, 2024 - arxiv.org

We study data corruption robustness for reinforcement learning with human feedback
(RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Low-Rank MDPs

K Hong, A Tewari - arXiv preprint arXiv:2402.04493, 2024 - arxiv.org

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected
cumulative reward using a pre-collected dataset. Offline RL with low-rank MDPs or general …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

The Role of Inherent Bellman Error in Offline Reinforcement Learning with Linear Function Approximation

N Golowich, A Moitra - arXiv preprint arXiv:2406.11686, 2024 - arxiv.org

In this paper, we study the offline RL problem with linear function approximation. Our main
structural assumption is that the MDP has low inherent Bellman error, which stipulates that …

[PDF] arxiv.org

Offline RL via Feature-Occupancy Gradient Ascent

G Neu, N Okolo - arXiv preprint arXiv:2405.13755, 2024 - arxiv.org

We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …

A Primal-Dual Algorithm for Offline Constrained Reinforcement Learning with Linear MDPs

K Hong, A Tewari - Forty-first International Conference on Machine … - openreview.net

We study offline reinforcement learning (RL) with linear MDPs under the infinite-horizon
discounted setting which aims to learn a policy that maximizes the expected discounted …

[PDF] tudelft.nl

[PDF][PDF] Offline Reinforcement Learning via Inverse Optimization

I Dimanidis, T Ok, PM Esfahani - 2024 - dcsc.tudelft.nl

Inspired by the recent successes of Inverse Optimization (IO) across various application
domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for …

[PDF] illinois.edu

Reinforcement learning under general function approximation and novel interaction settings

J Chen - 2023 - ideals.illinois.edu

Reinforcement Learning (RL) is an area of machine learning where an intelligent agent
solves sequential decision-making problems based on experience. Recent advances in the …