Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …
Offline RL via Feature-Occupancy Gradient Ascent
We study offline Reinforcement Learning in large infinite-horizon discounted Markov
Decision Processes (MDPs) when the reward and transition models are linearly realizable …
Decision Processes (MDPs) when the reward and transition models are linearly realizable …
Confident Natural Policy Gradient for Local Planning in -realizable Constrained MDPs
The constrained Markov decision process (CMDP) framework emerges as an important
reinforcement learning approach for imposing safety or other critical objectives while …
reinforcement learning approach for imposing safety or other critical objectives while …
Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear -Realizability and Concentrability
We consider offline reinforcement learning (RL) in $ H $-horizon Markov decision processes
(MDPs) under the linear $ q^\pi $-realizability assumption, where the action-value function of …
(MDPs) under the linear $ q^\pi $-realizability assumption, where the action-value function of …