Near-optimal time and sample complexities for solving Markov decision processes with a generative...

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

被引用次数：1638 相关文章所有 8 个版本

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

被引用次数：340 相关文章所有 2 个版本

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

被引用次数：307 相关文章所有 8 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … conference on machine …, 2022 - proceedings.mlr.press

Offline or batch reinforcement learning seeks to learn a near-optimal policy using history
data without active exploration of the environment. To counter the insufficient coverage and …

被引用次数：103 相关文章所有 10 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear mixture markov decision processes

D Zhou, Q Gu, C Szepesvari - Conference on Learning …, 2021 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation where the
underlying transition probability kernel of the Markov decision process (MDP) is a linear …

被引用次数：239 相关文章所有 7 个版本

[PDF] neurips.cc

Robust reinforcement learning using offline data

K Panaganti, Z Xu, D Kalathil… - Advances in neural …, 2022 - proceedings.neurips.cc

The goal of robust reinforcement learning (RL) is to learn a policy that is robust against the
uncertainty in model parameters. Parameter uncertainty commonly occurs in many real …

被引用次数：81 相关文章所有 8 个版本

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

被引用次数：320 相关文章所有 9 个版本

[PDF] jmlr.org

On the convergence rates of policy gradient methods

L Xiao - Journal of Machine Learning Research, 2022 - jmlr.org

We consider infinite-horizon discounted Markov decision problems with finite state and
action spaces and study the convergence rates of the projected policy gradient method and …

被引用次数：111 相关文章所有 4 个版本

[PDF] neurips.cc

The curious price of distributional robustness in reinforcement learning with a generative model

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

被引用次数：37 相关文章所有 10 个版本

[PDF] mlr.press

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press

Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

被引用次数：363 相关文章所有 9 个版本