An efficient and lightweight off-policy actor-critic reinforcement learning framework

H Zhang, H Ma, X Zhang, BW Mersha, L Wang… - Applied Soft …, 2024 - Elsevier
In the framework of current off-policy actor-critic methods, the state–action pairs in an
experience replay buffer (called historical behaviors) cannot be used to improve the policy …

Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

S Gurumurthy, Z Manchester… - Learning for Dynamics …, 2023 - proceedings.mlr.press
On-policy reinforcement learning algorithms have been shown to be remarkably efficient at
learning policies for continuous control robotics tasks. They are highly parallelizable and …

A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation

H Zhang, H Ma, BW Mersha, Y Jin - Applied Intelligence, 2024 - Springer
On-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step
interaction data for policy learning. However, on-policy DRL still faces challenges in …

Improved soft actor-critic: Mixing prioritized off-policy samples with on-policy experiences

C Banerjee, Z Chen, N Noman - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Soft actor-critic (SAC) is an off-policy actor-critic (AC) reinforcement learning (RL) algorithm,
essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off …

Noisy importance sampling actor-critic: an off-policy actor-critic with experience replay

N Tasfi, M Capretz - 2020 International Joint Conference on …, 2020 - ieeexplore.ieee.org
This paper presents Noisy Importance Sampling Actor-Critic (NISAC), a set of empirically
validated modifications to the advantage actor-critic algorithm (A2C), allowing off-policy …

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

B Saglam, DC Cicek, FB Mutlu, SS Kozat - arXiv preprint arXiv:2208.00755, 2022 - arxiv.org
Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can
improve data efficiency by repeatedly using the previously gathered data. However, off …

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov… - … on Machine Learning, 2021 - proceedings.mlr.press
Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework
which modifies traditional on-policy actor-critic methods by separating policy and value …

Optimal actor-critic policy with optimized training datasets

C Banerjee, Z Chen, N Noman… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Actor-critic (AC) algorithms are known for their efficacy and high performance in solving
reinforcement learning problems, but they also suffer from low sampling efficiency. An AC …

Off-policy actor-critic

T Degris, M White, RS Sutton - arXiv preprint arXiv:1205.4839, 2012 - arxiv.org
This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our
algorithm is online and incremental, and its per-time-step complexity scales linearly with the …

[PDF][PDF] Foresight distribution adjustment for off-policy reinforcement learning

R Chen, XH Liu, TS Liu, S Jiang, F Xu… - Proceedings of the 23rd …, 2024 - ifaamas.org
Off-policy reinforcement learning algorithms maintain a replay buffer to utilize samples
obtained from earlier policies. The sampling strategy that prioritizes certain data in a buffer to …