相关文章- 学术资源搜索

An efficient and lightweight off-policy actor-critic reinforcement learning framework

H Zhang, H Ma, X Zhang, BW Mersha, L Wang… - Applied Soft …, 2024 - Elsevier

In the framework of current off-policy actor-critic methods, the state–action pairs in an
experience replay buffer (called historical behaviors) cannot be used to improve the policy …

[PDF] mlr.press

Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

S Gurumurthy, Z Manchester… - Learning for Dynamics …, 2023 - proceedings.mlr.press

On-policy reinforcement learning algorithms have been shown to be remarkably efficient at
learning policies for continuous control robotics tasks. They are highly parallelizable and …

A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation

H Zhang, H Ma, BW Mersha, Y Jin - Applied Intelligence, 2024 - Springer

On-policy deep reinforcement learning (DRL) has the inherent advantage of using multi-step
interaction data for policy learning. However, on-policy DRL still faces challenges in …

Improved soft actor-critic: Mixing prioritized off-policy samples with on-policy experiences

C Banerjee, Z Chen, N Noman - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org

Soft actor-critic (SAC) is an off-policy actor-critic (AC) reinforcement learning (RL) algorithm,
essentially based on entropy regularization. SAC trains a policy by maximizing the trade-off …

被引用次数：32 相关文章所有 6 个版本

[PDF] core.ac.uk

Noisy importance sampling actor-critic: an off-policy actor-critic with experience replay

N Tasfi, M Capretz - 2020 International Joint Conference on …, 2020 - ieeexplore.ieee.org

This paper presents Noisy Importance Sampling Actor-Critic (NISAC), a set of empirically
validated modifications to the advantage actor-critic algorithm (A2C), allowing off-policy …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

B Saglam, DC Cicek, FB Mutlu, SS Kozat - arXiv preprint arXiv:2208.00755, 2022 - arxiv.org

Compared to on-policy counterparts, off-policy model-free deep reinforcement learning can
improve data efficiency by repeatedly using the previously gathered data. However, off …

被引用次数：1 相关文章所有 4 个版本

[PDF] mlr.press

Phasic policy gradient

KW Cobbe, J Hilton, O Klimov… - … on Machine Learning, 2021 - proceedings.mlr.press

Abstract We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework
which modifies traditional on-policy actor-critic methods by separating policy and value …

被引用次数：179 相关文章所有 5 个版本

[PDF] arxiv.org

Optimal actor-critic policy with optimized training datasets

C Banerjee, Z Chen, N Noman… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Actor-critic (AC) algorithms are known for their efficacy and high performance in solving
reinforcement learning problems, but they also suffer from low sampling efficiency. An AC …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

Off-policy actor-critic

T Degris, M White, RS Sutton - arXiv preprint arXiv:1205.4839, 2012 - arxiv.org

This paper presents the first actor-critic algorithm for off-policy reinforcement learning. Our
algorithm is online and incremental, and its per-time-step complexity scales linearly with the …

被引用次数：649 相关文章所有 19 个版本

[PDF] ifaamas.org

[PDF][PDF] Foresight distribution adjustment for off-policy reinforcement learning

R Chen, XH Liu, TS Liu, S Jiang, F Xu… - Proceedings of the 23rd …, 2024 - ifaamas.org

Off-policy reinforcement learning algorithms maintain a replay buffer to utilize samples
obtained from earlier policies. The sampling strategy that prioritizes certain data in a buffer to …

被引用次数：4 相关文章所有 2 个版本

An efficient and lightweight off-policy actor-critic reinforcement learning framework

Practical Critic Gradient based Actor Critic for On-Policy Reinforcement Learning

A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation

Improved soft actor-critic: Mixing prioritized off-policy samples with on-policy experiences

Noisy importance sampling actor-critic: an off-policy actor-critic with experience replay

Mitigating Off-Policy Bias in Actor-Critic Methods with One-Step Q-learning: A Novel Correction Approach

Phasic policy gradient

Optimal actor-critic policy with optimized training datasets

Off-policy actor-critic

[PDF][PDF] Foresight distribution adjustment for off-policy reinforcement learning

相关搜索

高级搜索

引用