Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …
data without active exploration of the environment. To counter the insufficient coverage and …
Provably efficient reinforcement learning with linear function approximation
Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …
with an enormous number of states, where\emph {function approximation} must be deployed …
Is Q-learning provably efficient?
Abstract Model-free reinforcement learning (RL) algorithms directly parameterize and
update value functions or policies, bypassing the modeling of the environment. They are …
update value functions or policies, bypassing the modeling of the environment. They are …
Provably efficient exploration in policy optimization
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …
it is significantly less understood in theory, especially compared with value-based RL. In …
Sample-optimal parametric q-learning using linearly additive features
Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …
can linearly express the process's probabilistic transition model. We propose a parametric Q …
Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances
In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …
linear programs with two-sided constraints. In the special case of the minimum cost flow …
Almost optimal model-free reinforcement learningvia reference-advantage decomposition
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …
Near-optimal time and sample complexities for solving Markov decision processes with a generative model
In this paper we consider the problem of computing an $\epsilon $-optimal policy of a
discounted Markov Decision Process (DMDP) provided we can only access its transition …
discounted Markov Decision Process (DMDP) provided we can only access its transition …
Model-based reinforcement learning with a generative model is minimax optimal
This work considers the sample and computational complexity of obtaining an $\epsilon $-
optimal policy in a discounted Markov Decision Process (MDP), given only access to a …
optimal policy in a discounted Markov Decision Process (MDP), given only access to a …
Provably efficient reinforcement learning for discounted mdps with feature mapping
Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature mapping to represent states and actions …
them efficiently, one often uses predefined feature mapping to represent states and actions …