Learning near optimal policies with low inherent bellman error
We study the exploration problem with approximate linear action-value functions in episodic
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
reinforcement learning under the notion of low inherent Bellman error, a condition normally …
Conservative safety critics for exploration
Safe exploration presents a major challenge in reinforcement learning (RL): when active
data collection requires deploying partially trained policies, we must ensure that these …
data collection requires deploying partially trained policies, we must ensure that these …
Deep exploration via randomized value functions
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …
learning. This offers an elegant means for synthesizing statistically and computationally …
Pc-pg: Policy cover directed exploration for provable policy gradient learning
Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …
variety of reasons: they are model free, they directly optimize the performance metric of …
Is reinforcement learning more difficult than bandits? a near-optimal algorithm escaping the curse of horizon
Episodic reinforcement learning and contextual bandits are two widely studied sequential
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
decision-making problems. Episodic reinforcement learning generalizes contextual bandits …
Learning zero-sum simultaneous-move markov games using function approximation and correlated equilibrium
In this work, we develop provably efficient reinforcement learning algorithms for two-player
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …
zero-sum Markov games with simultaneous moves. We consider a family of Markov games …
Frequentist regret bounds for randomized least-squares value iteration
A Zanette, D Brandfonbrener… - International …, 2020 - proceedings.mlr.press
We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning
(RL). When the state space is large or continuous, traditional tabular approaches are …
(RL). When the state space is large or continuous, traditional tabular approaches are …
Model-based rl with optimistic posterior sampling: Structural conditions and sample complexity
We propose a general framework to design posterior sampling methods for model-based
RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger …
RL. We show that the proposed algorithms can be analyzed by reducing regret to Hellinger …
Corruption-robust exploration in episodic reinforcement learning
T Lykouris, M Simchowitz… - … on Learning Theory, 2021 - proceedings.mlr.press
We initiate the study of episodic reinforcement learning under adversarial corruptions in both
the rewards and the transition probabilities of the underlying system extending recent results …
the rewards and the transition probabilities of the underlying system extending recent results …
On function approximation in reinforcement learning: Optimism in the face of large state spaces
The classical theory of reinforcement learning (RL) has focused on tabular and linear
representations of value functions. Further progress hinges on combining RL with modern …
representations of value functions. Further progress hinges on combining RL with modern …