Cliff diving: exploring reward surfaces in reinforcement learning environments

N Rahn, P D'Oro, H Wiltzer… - Advances in Neural …, 2024 - proceedings.neurips.cc

Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org

We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

被引用次数：16 相关文章所有 6 个版本

Conservative network for offline reinforcement learning

Z Peng, Y Liu, H Chen, Z Zhou - Knowledge-Based Systems, 2023 - Elsevier

Offline reinforcement learning (RL) aims to learn policies from static datasets. The value
overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general …

被引用次数：1 相关文章所有 2 个版本

[PDF] haow.us

Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

H Yu, H Wang, D Tiwari, J Li… - … Conference for High …, 2024 - ieeexplore.ieee.org

Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas,
including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Evolutionary strategy guided reinforcement learning via multibuffer communication

A Callaghan, K Mason, P Mannion - arXiv preprint arXiv:2306.11535, 2023 - arxiv.org

Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved
control problems across a variety of domains. Recently, algorithms have been proposed …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Stein Variational Evolution Strategies

CV Braun, RT Lange, M Toussaint - arXiv preprint arXiv:2410.10390, 2024 - arxiv.org

Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an
unnormalized probability distribution. However, the SVGD update relies on gradients of the …

Investigating the Impact of Action Representations in Policy Gradient Algorithms

J Schneider, P Schumacher, D Häufle… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-
world tasks. However, influences on the learning performance of RL algorithms are often …

被引用次数：3 相关文章所有 5 个版本