Variance reduced value iteration and faster algorithms for solving Markov decision processes

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：99 相关文章所有 10 个版本

[PDF] mlr.press

Provably efficient reinforcement learning with linear function approximation

C Jin, Z Yang, Z Wang… - Conference on learning …, 2020 - proceedings.mlr.press

Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …

被引用次数：735 相关文章所有 4 个版本

[PDF] neurips.cc

Is Q-learning provably efficient?

C Jin, Z Allen-Zhu, S Bubeck… - Advances in neural …, 2018 - proceedings.neurips.cc

Abstract Model-free reinforcement learning (RL) algorithms directly parameterize and
update value functions or policies, bypassing the modeling of the environment. They are …

被引用次数：983 相关文章所有 7 个版本

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

被引用次数：308 相关文章所有 9 个版本

[PDF] mlr.press

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press

Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

被引用次数：354 相关文章所有 9 个版本

[PDF] acm.org

Minimum cost flows, MDPs, and ℓ₁-regression in nearly linear time for dense instances

J Van Den Brand, YT Lee, YP Liu, T Saranurak… - Proceedings of the 53rd …, 2021 - dl.acm.org

In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …

被引用次数：148 相关文章所有 8 个版本

[PDF] neurips.cc

Almost optimal model-free reinforcement learningvia reference-advantage decomposition

Z Zhang, Y Zhou, X Ji - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov
Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a …

被引用次数：173 相关文章所有 8 个版本

[PDF] neurips.cc

Near-optimal time and sample complexities for solving Markov decision processes with a generative model

A Sidford, M Wang, X Wu, L Yang… - Advances in Neural …, 2018 - proceedings.neurips.cc

In this paper we consider the problem of computing an $\epsilon $-optimal policy of a
discounted Markov Decision Process (DMDP) provided we can only access its transition …

被引用次数：249 相关文章所有 6 个版本

[PDF] mlr.press

Model-based reinforcement learning with a generative model is minimax optimal

A Agarwal, S Kakade, LF Yang - Conference on Learning …, 2020 - proceedings.mlr.press

This work considers the sample and computational complexity of obtaining an $\epsilon $-
optimal policy in a discounted Markov Decision Process (MDP), given only access to a …

被引用次数：199 相关文章所有 9 个版本

[PDF] mlr.press

Provably efficient reinforcement learning for discounted mdps with feature mapping

D Zhou, J He, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press

Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature mapping to represent states and actions …

被引用次数：145 相关文章所有 5 个版本