Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - arXiv preprint arXiv:2307.01452, 2023 - arxiv.org
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

Maximize to explore: One objective function fusing estimation, planning, and exploration

Z Liu, M Lu, W Xiong, H Zhong, H Hu… - Advances in …, 2024 - proceedings.neurips.cc
In reinforcement learning (RL), balancing exploration and exploitation is crucial for
achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient …

Future-dependent value-based off-policy evaluation in pomdps

M Uehara, H Kiyohara, A Bennett… - Advances in …, 2024 - proceedings.neurips.cc
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …

[HTML][HTML] A survey of demonstration learning

A Correia, LA Alexandre - Robotics and Autonomous Systems, 2024 - Elsevier
With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

Provably efficient ucb-type algorithms for learning predictive state representations

R Huang, Y Liang, J Yang - arXiv preprint arXiv:2307.00405, 2023 - arxiv.org
The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …

Offline RL with discrete proxy representations for generalizability in POMDPs

P Gu, X Cai, D Xing, X Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Offline Reinforcement Learning (RL) has demonstrated promising results in various
applications by learning policies from previously collected datasets, reducing the need for …

Provably efficient offline reinforcement learning in regular decision processes

R Cipollone, A Jonsson, A Ronca… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …

A policy gradient method for confounded pomdps

M Hong, Z Qi, Y Xu - arXiv preprint arXiv:2305.17083, 2023 - arxiv.org
In this paper, we propose a policy gradient method for confounded partially observable
Markov decision processes (POMDPs) with continuous state and observation spaces in the …

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

J Hong, A Dragan, S Levine - arXiv preprint arXiv:2310.20663, 2023 - arxiv.org
Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …

Tractable Offline Learning of Regular Decision Processes

A Deb, R Cipollone, A Jonsson, A Ronca… - arXiv preprint arXiv …, 2024 - arxiv.org
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian
environments called Regular Decision Processes (RDPs). In RDPs, the unknown …