Gap-dependent unsupervised exploration for reinforcement learning

AJ Wagenmaker, Y Chen… - International …, 2022 - proceedings.mlr.press

Reward-free reinforcement learning (RL) considers the setting where the agent does not
have access to a reward function during exploration, but must propose a near-optimal policy …

被引用次数：61 相关文章所有 7 个版本

[PDF] mlr.press

Offline reinforcement learning under value and density-ratio realizability: the power of gaps

J Chen, N Jiang - Uncertainty in Artificial Intelligence, 2022 - proceedings.mlr.press

We consider a challenging theoretical problem in offline reinforcement learning (RL):
obtaining sample-efficiency guarantees with a dataset lacking sufficient coverage, under …

被引用次数：36 相关文章所有 7 个版本

[PDF] openreview.net

Provable offline reinforcement learning with human feedback

W Zhan, M Uehara, N Kallus, JD Lee… - ICML 2023 Workshop …, 2023 - openreview.net

In this paper, we investigate the problem of offline reinforcement learning with human
feedback where feedback is available in the form of preference between trajectory pairs …

被引用次数：26 相关文章所有 3 个版本

[PDF] neurips.cc

On the statistical efficiency of reward-free exploration in non-linear rl

J Chen, A Modi, A Krishnamurthy… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study reward-free reinforcement learning (RL) under general non-linear function
approximation, and establish sample efficiency and hardness results under various standard …

被引用次数：28 相关文章所有 10 个版本

[PDF] arxiv.org

Provable offline preference-based reinforcement learning

W Zhan, M Uehara, N Kallus, JD Lee, W Sun - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we investigate the problem of offline Preference-based Reinforcement
Learning (PbRL) with human feedback where feedback is available in the form of preference …

被引用次数：18 相关文章所有 3 个版本

[PDF] mlr.press

Computationally efficient pac rl in pomdps with latent determinism and conditional embeddings

M Uehara, A Sekhari, JD Lee… - … on Machine Learning, 2023 - proceedings.mlr.press

We study reinforcement learning with function approximation for large-scale Partially
Observable Markov Decision Processes (POMDPs) where the state space and observation …

被引用次数：12 相关文章所有 8 个版本

[PDF] neurips.cc

Offline minimax soft-q-learning under realizability and partial coverage

M Uehara, N Kallus, JD Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc

We consider offline reinforcement learning (RL) where we only have only access to offline
data. In contrast to numerous offline RL algorithms that necessitate the uniform coverage of …

被引用次数：4 相关文章所有 7 个版本

[PDF] neurips.cc

Provably feedback-efficient reinforcement learning via active reward learning

D Kong, L Yang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

An appropriate reward function is of paramount importance in specifying a task in
reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to …

被引用次数：6 相关文章所有 8 个版本

[PDF] openreview.net

Towards minimax optimal reward-free reinforcement learning in linear mdps

P Hu, Y Chen, L Huang - The Eleventh International Conference on …, 2022 - openreview.net

We study reward-free reinforcement learning with linear function approximation for episodic
Markov decision processes (MDPs). In this setting, an agent first interacts with the …

被引用次数：5 相关文章

Experiments focused on exploration in deep reinforcement learning

M Kaloev, G Krastev - 2021 5th International Symposium on …, 2021 - ieeexplore.ieee.org

Automation with deep neural networks is popular topic. Personal artificial assistance, self
driving cars and recommendations algorithms, they all use reinforcement learning. However …

被引用次数：12 相关文章