- 学术资源搜索

When is partially observable reinforcement learning not scary?

Q Liu, A Chung, C Szepesvári… - Conference on Learning …, 2022 - proceedings.mlr.press

Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …

被引用次数：91 相关文章所有 7 个版本

[PDF] neurips.cc

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

被引用次数：34 相关文章所有 8 个版本

[PDF] acm.org

Optimistic mle: A generic model-based algorithm for partially observable sequential decision making

Q Liu, P Netrapalli, C Szepesvari, C Jin - Proceedings of the 55th …, 2023 - dl.acm.org

This paper introduces a simple efficient learning algorithms for general sequential decision
making. The algorithm combines Optimism for exploration with Maximum Likelihood …

被引用次数：33 相关文章所有 4 个版本

[PDF] neurips.cc

Learning in observable pomdps, without computationally intractable oracles

N Golowich, A Moitra, D Rohatgi - Advances in neural …, 2022 - proceedings.neurips.cc

Much of reinforcement learning theory is built on top of oracles that are computationally hard
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …

被引用次数：34 相关文章所有 7 个版本

[PDF] arxiv.org

Pac reinforcement learning for predictive state representations

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2207.05738, 2022 - arxiv.org

In this paper we study online Reinforcement Learning (RL) in partially observable dynamical
systems. We focus on the Predictive State Representations (PSRs) model, which is an …

被引用次数：39 相关文章所有 3 个版本

[PDF] mlr.press

Learning in pomdps is sample-efficient with hindsight observability

J Lee, A Agarwal, C Dann… - … Conference on Machine …, 2023 - proceedings.mlr.press

POMDPs capture a broad class of decision making problems, but hardness results suggest
that learning is intractable even in simple settings due to the inherent partial observability …

被引用次数：15 相关文章所有 10 个版本

[PDF] arxiv.org

Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond

H Zhong, W Xiong, S Zheng, L Wang, Z Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

We study sample efficient reinforcement learning (RL) under the general framework of
interactive decision making, which includes Markov decision process (MDP), partially …

被引用次数：24 相关文章所有 3 个版本

[PDF] neurips.cc

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

被引用次数：17 相关文章所有 8 个版本

[PDF] mlr.press

Lower bounds for learning in revealing POMDPs

F Chen, H Wang, C Xiong, S Mei… - … Conference on Machine …, 2023 - proceedings.mlr.press

This paper studies the fundamental limits of reinforcement learning (RL) in the challenging
partially observable setting. While it is well-established that learning in Partially Observable …

被引用次数：10 相关文章所有 6 个版本

[PDF] neurips.cc

Posterior sampling for competitive RL: function approximation and partial observation

S Qiu, Z Dai, H Zhong, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates posterior sampling algorithms for competitive reinforcement learning
(RL) in the context of general function approximations. Focusing on zero-sum Markov games …

被引用次数：3 相关文章所有 8 个版本