Provably efficient offline reinforcement learning for partially observable markov decision processes

Z Deng, J Jiang, G Long, C Zhang - arXiv preprint arXiv:2307.01452, 2023 - arxiv.org

Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

被引用次数：12 相关文章所有 5 个版本

[PDF] neurips.cc

Maximize to explore: One objective function fusing estimation, planning, and exploration

Z Liu, M Lu, W Xiong, H Zhong, H Hu… - Advances in …, 2024 - proceedings.neurips.cc

In reinforcement learning (RL), balancing exploration and exploitation is crucial for
achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient …

被引用次数：15 相关文章所有 6 个版本

[PDF] neurips.cc

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc

We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

被引用次数：18 相关文章所有 8 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey of demonstration learning

A Correia, LA Alexandre - Robotics and Autonomous Systems, 2024 - Elsevier

With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

被引用次数：9 相关文章所有 3 个版本

[PDF] arxiv.org

Provably efficient ucb-type algorithms for learning predictive state representations

R Huang, Y Liang, J Yang - arXiv preprint arXiv:2307.00405, 2023 - arxiv.org

The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …

被引用次数：4 相关文章所有 5 个版本

[PDF] neurips.cc

Offline RL with discrete proxy representations for generalizability in POMDPs

P Gu, X Cai, D Xing, X Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Offline Reinforcement Learning (RL) has demonstrated promising results in various
applications by learning policies from previously collected datasets, reducing the need for …

Provably efficient offline reinforcement learning in regular decision processes

R Cipollone, A Jonsson, A Ronca… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

A policy gradient method for confounded pomdps

M Hong, Z Qi, Y Xu - arXiv preprint arXiv:2305.17083, 2023 - arxiv.org

In this paper, we propose a policy gradient method for confounded partially observable
Markov decision processes (POMDPs) with continuous state and observation spaces in the …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

J Hong, A Dragan, S Levine - arXiv preprint arXiv:2310.20663, 2023 - arxiv.org

Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Tractable Offline Learning of Regular Decision Processes

A Deb, R Cipollone, A Jonsson, A Ronca… - arXiv preprint arXiv …, 2024 - arxiv.org

This work studies offline Reinforcement Learning (RL) in a class of non-Markovian
environments called Regular Decision Processes (RDPs). In RDPs, the unknown …