When is partially observable reinforcement learning not scary?

Q Liu, A Chung, C Szepesvári… - Conference on Learning …, 2022 - proceedings.mlr.press
Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …

Provably efficient reinforcement learning in partially observable dynamical systems

M Uehara, A Sekhari, JD Lee… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract We study Reinforcement Learning for partially observable systems using function
approximation. We propose a new PO-bilinear framework, that is general enough to include …

Optimistic mle: A generic model-based algorithm for partially observable sequential decision making

Q Liu, P Netrapalli, C Szepesvari, C Jin - Proceedings of the 55th …, 2023 - dl.acm.org
This paper introduces a simple efficient learning algorithms for general sequential decision
making. The algorithm combines Optimism for exploration with Maximum Likelihood …

Learning in observable pomdps, without computationally intractable oracles

N Golowich, A Moitra, D Rohatgi - Advances in neural …, 2022 - proceedings.neurips.cc
Much of reinforcement learning theory is built on top of oracles that are computationally hard
to implement. Specifically for learning near-optimal policies in Partially Observable Markov …

Pac reinforcement learning for predictive state representations

W Zhan, M Uehara, W Sun, JD Lee - arXiv preprint arXiv:2207.05738, 2022 - arxiv.org
In this paper we study online Reinforcement Learning (RL) in partially observable dynamical
systems. We focus on the Predictive State Representations (PSRs) model, which is an …

Learning in pomdps is sample-efficient with hindsight observability

J Lee, A Agarwal, C Dann… - … Conference on Machine …, 2023 - proceedings.mlr.press
POMDPs capture a broad class of decision making problems, but hardness results suggest
that learning is intractable even in simple settings due to the inherent partial observability …

Gec: A unified framework for interactive decision making in mdp, pomdp, and beyond

H Zhong, W Xiong, S Zheng, L Wang, Z Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
We study sample efficient reinforcement learning (RL) under the general framework of
interactive decision making, which includes Markov decision process (MDP), partially …

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

Lower bounds for learning in revealing POMDPs

F Chen, H Wang, C Xiong, S Mei… - … Conference on Machine …, 2023 - proceedings.mlr.press
This paper studies the fundamental limits of reinforcement learning (RL) in the challenging
partially observable setting. While it is well-established that learning in Partially Observable …

Posterior sampling for competitive RL: function approximation and partial observation

S Qiu, Z Dai, H Zhong, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper investigates posterior sampling algorithms for competitive reinforcement learning
(RL) in the context of general function approximations. Focusing on zero-sum Markov games …