Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

Offline RL via Feature-Occupancy Gradient Ascent

G Neu, N Okolo - arXiv preprint arXiv:2405.13755, 2024 - arxiv.org
We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …

Confident Natural Policy Gradient for Local Planning in -realizable Constrained MDPs

T Tian, LF Yang, C Szepesvári - arXiv preprint arXiv:2406.18529, 2024 - arxiv.org
The constrained Markov decision process (CMDP) framework emerges as an important
reinforcement learning approach for imposing safety or other critical objectives while …

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear -Realizability and Concentrability

V Tkachuk, G Weisz, C Szepesvári - arXiv preprint arXiv:2405.16809, 2024 - arxiv.org
We consider offline reinforcement learning (RL) in $ H $-horizon Markov decision processes
(MDPs) under the linear $ q^\pi $-realizability assumption, where the action-value function of …