Generalized proximal policy optimization with sample reuse

J Queeney, Y Paschalidis… - Advances in Neural …, 2021 - proceedings.neurips.cc
In real-world decision making tasks, it is critical for data-driven reinforcement learning
methods to be both stable and sample efficient. On-policy methods typically generate …

A novel policy based on action confidence limit to improve exploration efficiency in reinforcement learning

F Huang, X Deng, Y He, W Jiang - Information Sciences, 2023 - Elsevier
Reinforcement learning has been used to solve many intelligent decision-making problems.
However, reinforcement learning still faces a challenge of the low exploration efficiency …

A modified random network distillation algorithm and its application in USVs naval battle simulation

J Rao, X Xu, H Bian, J Chen, Y Wang, J Lei… - Ocean …, 2022 - Elsevier
Unmanned surface vessel (USV) operations will change the future form of maritime wars
profoundly, and one of the critical factors for victory is the cluster intelligence of USVs …

A Reinforcement Learning Method of Solving Markov Decision Processes: An Adaptive Exploration Model Based on Temporal Difference Error

X Wang, Z Yang, G Chen, Y Liu - Electronics, 2023 - mdpi.com
Traditional backward recursion methods face a fundamental challenge in solving Markov
Decision Processes (MDP), where there exists a contradiction between the need for …

Explanation-aware experience replay in rule-dense environments

F Sovrano, A Raymond, A Prorok - IEEE Robotics and …, 2021 - ieeexplore.ieee.org
Human environments are often regulated by explicit and complex rulesets. Integrating
Reinforcement Learning (RL) agents into such environments motivates the development of …

Mixed experience sampling for off-policy reinforcement learning

J Yu, J Li, S Lü, S Han - Expert Systems with Applications, 2024 - Elsevier
In deep reinforcement learning, experience replay is usually used to improve data efficiency
and alleviate experience forgetting. However, online reinforcement learning is often …

Lucid dreaming for experience replay: refreshing past states with the current policy

Y Du, G Warnell, A Gebremedhin, P Stone… - Neural Computing and …, 2022 - Springer
Experience replay (ER) improves the data efficiency of off-policy reinforcement learning (RL)
algorithms by allowing an agent to store and reuse its past experiences in a replay buffer …

Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation

A Andres, E Villar-Rodriguez… - 2022 IEEE Symposium …, 2022 - ieeexplore.ieee.org
Reinforcement Learning has emerged as a strong alternative to solve optimization tasks
efficiently. The use of these algorithms highly depends on the feedback signals provided by …

Elements of episodic memory: insights from artificial agents

A Boyle, A Blomkvist - … Transactions of the Royal Society B …, 2024 - eprints.lse.ac.uk
Many recent AI systems take inspiration from biological episodic memory. Here, we ask how
these 'episodic-inspired'AI systems might inform our understanding of biological episodic …

An effective maximum entropy exploration approach for deceptive game in reinforcement learning

C Li, X Wei, Y Zhao, X Geng - Neurocomputing, 2020 - Elsevier
Deceptive games are games that utilize the reward structure to keep the agent away from
the global optimization and have been grown up to become a huge challenge in the field of …