Reward-free rl is no harder than reward-aware rl in linear markov decision processes
AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …
have access to a reward function during exploration, but must propose a near-optimal policy …
Offline reinforcement learning under value and density-ratio realizability: the power of gaps
We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …
Provable offline reinforcement learning with human feedback
In this paper, we investigate the problem of offline reinforcement learning with human
feedback where feedback is available in the form of preference between trajectory pairs …
feedback where feedback is available in the form of preference between trajectory pairs …
On the statistical efficiency of reward-free exploration in non-linear rl
We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …
approximation, and establish sample efficiency and hardness results under various standard …
Provable offline preference-based reinforcement learning
In this paper, we investigate the problem of offline Preference-based Reinforcement
Learning (PbRL) with human feedback where feedback is available in the form of preference …
Learning (PbRL) with human feedback where feedback is available in the form of preference …
Computationally efficient pac rl in pomdps with latent determinism and conditional embeddings
We study reinforcement learning with function approximation for large-scale Partially
Observable Markov Decision Processes (POMDPs) where the state space and observation …
Observable Markov Decision Processes (POMDPs) where the state space and observation …
Offline minimax soft-q-learning under realizability and partial coverage
We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …
Provably feedback-efficient reinforcement learning via active reward learning
An appropriate reward function is of paramount importance in specifying a task in
reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to …
reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to …
Towards minimax optimal reward-free reinforcement learning in linear mdps
We study reward-free reinforcement learning with linear function approximation for episodic
Markov decision processes (MDPs). In this setting, an agent first interacts with the …
Markov decision processes (MDPs). In this setting, an agent first interacts with the …
Experiments focused on exploration in deep reinforcement learning
Automation with deep neural networks is popular topic. Personal artificial assistance, self
driving cars and recommendations algorithms, they all use reinforcement learning. However …
driving cars and recommendations algorithms, they all use reinforcement learning. However …