On the reuse bias in off-policy reinforcement learning

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

On the reuse bias in off-policy reinforcement learning

在引用文章中搜索

[PDF] arxiv.org

Augmenting Offline RL with Unlabeled Data

Z Wang, B Gangopadhyay, JF Yeh… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in offline Reinforcement Learning (Offline RL) have led to an
increased focus on methods based on conservative policy updates to address the Out-of …

Off-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)

K Siudek-Tkaczuk, S Kapka, J Alchimowicz… - Proceedings of the 18th …, 2024 - dl.acm.org

Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using
offline Reinforcement Learning. In this work, we describe our custom OPS method and its …

[PDF] arxiv.org

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

B Saglam, DC Cicek, FB Mutlu, SS Kozat - arXiv preprint arXiv:2208.00755, 2022 - arxiv.org

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can
improve data efficiency by repeatedly using the previously gathered data. However, off …

被引用次数：1 相关文章所有 4 个版本