Reward-free rl is no harder than reward-aware rl in linear markov decision processes

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press
Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

J Chen, N Jiang - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press
We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …

Provable offline reinforcement learning with human feedback

W Zhan, M Uehara, N Kallus, JD Lee… - ICML 2023 Workshop …, 2023 - openreview.net
In this paper, we investigate the problem of offline reinforcement learning with human
feedback where feedback is available in the form of preference between trajectory pairs …

On the statistical efficiency of reward-free exploration in non-linear rl

J Chen, A Modi, A Krishnamurthy… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …

Provable offline preference-based reinforcement learning

W Zhan, M Uehara, N Kallus, JD Lee, W Sun - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we investigate the problem of offline Preference-based Reinforcement
Learning (PbRL) with human feedback where feedback is available in the form of preference …

Computationally efficient pac rl in pomdps with latent determinism and conditional embeddings

M Uehara, A Sekhari, JD Lee… - … on Machine Learning, 2023 - proceedings.mlr.press
We study reinforcement learning with function approximation for large-scale Partially
Observable Markov Decision Processes (POMDPs) where the state space and observation …

Offline minimax soft-q-learning under realizability and partial coverage

M Uehara, N Kallus, JD Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc
We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …

Provably feedback-efficient reinforcement learning via active reward learning

D Kong, L Yang - Advances in Neural Information …, 2022 - proceedings.neurips.cc
An appropriate reward function is of paramount importance in specifying a task in
reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to …

Towards minimax optimal reward-free reinforcement learning in linear mdps

P Hu, Y Chen, L Huang - The Eleventh International Conference on …, 2022 - openreview.net
We study reward-free reinforcement learning with linear function approximation for episodic
Markov decision processes (MDPs). In this setting, an agent first interacts with the …

Experiments focused on exploration in deep reinforcement learning

M Kaloev, G Krastev - 2021 5th International Symposium on …, 2021 - ieeexplore.ieee.org
Automation with deep neural networks is popular topic. Personal artificial assistance, self
driving cars and recommendations algorithms, they all use reinforcement learning. However …