Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

Provably efficient reinforcement learning with linear function approximation

C Jin, Z Yang, Z Wang… - Conference on learning …, 2020 - proceedings.mlr.press
Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …

Is Q-learning provably efficient?

C Jin, Z Allen-Zhu, S Bubeck… - Advances in neural …, 2018 - proceedings.neurips.cc
Abstract Model-free reinforcement learning (RL) algorithms directly parameterize and
update value functions or policies, bypassing the modeling of the environment. They are …

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press
Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances

J Van Den Brand, YT Lee, YP Liu, T Saranurak… - Proceedings of the 53rd …, 2021 - dl.acm.org
In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020 - proceedings.neurips.cc
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

Near-optimal time and sample complexities for solving Markov decision processes with a generative model

A Sidford, M Wang, X Wu, L Yang… - Advances in Neural …, 2018 - proceedings.neurips.cc
In this paper we consider the problem of computing an $\epsilon $-optimal policy of a
discounted Markov Decision Process (DMDP) provided we can only access its transition …

Model-based reinforcement learning with a generative model is minimax optimal

A Agarwal, S Kakade, LF Yang - Conference on Learning …, 2020 - proceedings.mlr.press
This work considers the sample and computational complexity of obtaining an $\epsilon $-
optimal policy in a discounted Markov Decision Process (MDP), given only access to a …

Provably efficient reinforcement learning for discounted mdps with feature mapping

D Zhou, J He, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press
Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature mapping to represent states and actions …