Asymptotically efficient off-policy evaluation for tabular reinforcement learning

M Uehara, C Shi, N Kallus - arXiv preprint arXiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

被引用次数：48 相关文章所有 2 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：392 相关文章所有 7 个版本

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

被引用次数：124 相关文章所有 8 个版本

[PDF] neurips.cc

Towards instance-optimal offline reinforcement learning with pessimism

M Yin, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We study the\emph {offline reinforcement learning}(offline RL) problem, where the goal is to
learn a reward-maximizing policy in an unknown\emph {Markov Decision Process}(MDP) …

被引用次数：77 相关文章所有 7 个版本

[PDF] jmlr.org

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org

Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

被引用次数：188 相关文章所有 7 个版本

[PDF] mlr.press

Minimax-optimal off-policy evaluation with linear function approximation

Y Duan, Z Jia, M Wang - International Conference on …, 2020 - proceedings.mlr.press

This paper studies the statistical theory of off-policy evaluation with function approximation in
batch data reinforcement learning problem. We consider a regression-based fitted Q …

被引用次数：163 相关文章所有 6 个版本

[PDF] neurips.cc

Mitigating covariate shift in imitation learning via offline data with partial coverage

J Chang, M Uehara, D Sreenivas… - Advances in Neural …, 2021 - proceedings.neurips.cc

This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert
demonstrator without additional online environment interactions. Instead, the learner is …

被引用次数：84 相关文章所有 7 个版本

[PDF] arxiv.org

Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arXiv preprint arXiv:2203.05804, 2022 - arxiv.org

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

被引用次数：72 相关文章所有 7 个版本

Reinforcement learning with guarantees: a review

P Osinenko, D Dobriborsci, W Aumer - IFAC-PapersOnLine, 2022 - Elsevier

Reinforcement learning is concerned with a generic concept of an agent acting in an
environment. From the control theory standpoint, reinforcement learning may be considered …

被引用次数：16 相关文章

[PDF] neurips.cc

Near-optimal offline reinforcement learning via double variance reduction

M Yin, Y Bai, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We consider the problem of offline reinforcement learning (RL)---a well-motivated setting of
RL that aims at policy optimization using only historical data. Despite its wide applicability …

被引用次数：66 相关文章所有 8 个版本