Causal reinforcement learning: A survey
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …
under uncertainty. Despite many remarkable achievements in recent decades, applying …
Maximize to explore: One objective function fusing estimation, planning, and exploration
In reinforcement learning (RL), balancing exploration and exploitation is crucial for
achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient …
achieving an optimal policy in a sample-efficient way. To this end, existing sample-efficient …
Future-dependent value-based off-policy evaluation in pomdps
We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general
function approximation. Existing methods such as sequential importance sampling …
function approximation. Existing methods such as sequential importance sampling …
[HTML][HTML] A survey of demonstration learning
A Correia, LA Alexandre - Robotics and Autonomous Systems, 2024 - Elsevier
With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …
to automate human tasks in different areas. However, training such agents is difficult and …
Provably efficient ucb-type algorithms for learning predictive state representations
The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …
Offline RL with discrete proxy representations for generalizability in POMDPs
Abstract Offline Reinforcement Learning (RL) has demonstrated promising results in various
applications by learning policies from previously collected datasets, reducing the need for …
applications by learning policies from previously collected datasets, reducing the need for …
Provably efficient offline reinforcement learning in regular decision processes
This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …
Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes …
A policy gradient method for confounded pomdps
In this paper, we propose a policy gradient method for confounded partially observable
Markov decision processes (POMDPs) with continuous state and observation spaces in the …
Markov decision processes (POMDPs) with continuous state and observation spaces in the …
Offline RL with Observation Histories: Analyzing and Improving Sample Complexity
Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …
dataset consisting only of suboptimal trials. One way that this can happen is by" stitching" …
Tractable Offline Learning of Regular Decision Processes
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian
environments called Regular Decision Processes (RDPs). In RDPs, the unknown …
environments called Regular Decision Processes (RDPs). In RDPs, the unknown …