Augmenting Offline RL with Unlabeled Data

Z Wang, B Gangopadhyay, JF Yeh… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in offline Reinforcement Learning (Offline RL) have led to an
increased focus on methods based on conservative policy updates to address the Out-of …

Off-Policy Selection for Optimizing Ad Display Timing in Mobile Games (Samsung Instant Plays)

K Siudek-Tkaczuk, S Kapka, J Alchimowicz… - Proceedings of the 18th …, 2024 - dl.acm.org
Off-Policy Selection (OPS) aims to select the best policy from a set of policies trained using
offline Reinforcement Learning. In this work, we describe our custom OPS method and its …

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

B Saglam, DC Cicek, FB Mutlu, SS Kozat - arXiv preprint arXiv:2208.00755, 2022 - arxiv.org
Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can
improve data efficiency by repeatedly using the previously gathered data. However, off …