Policy optimization in a noisy neighborhood: On return landscapes in continuous control

N Rahn, P D'Oro, H Wiltzer… - Advances in Neural …, 2024 - proceedings.neurips.cc
Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …

Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

Conservative network for offline reinforcement learning

Z Peng, Y Liu, H Chen, Z Zhou - Knowledge-Based Systems, 2023 - Elsevier
Offline reinforcement learning (RL) aims to learn policies from static datasets. The value
overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general …

Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

H Yu, H Wang, D Tiwari, J Li… - … Conference for High …, 2024 - ieeexplore.ieee.org
Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas,
including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL …

Evolutionary strategy guided reinforcement learning via multibuffer communication

A Callaghan, K Mason, P Mannion - arXiv preprint arXiv:2306.11535, 2023 - arxiv.org
Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved
control problems across a variety of domains. Recently, algorithms have been proposed …

Stein Variational Evolution Strategies

CV Braun, RT Lange, M Toussaint - arXiv preprint arXiv:2410.10390, 2024 - arxiv.org
Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an
unnormalized probability distribution. However, the SVGD update relies on gradients of the …

Investigating the Impact of Action Representations in Policy Gradient Algorithms

J Schneider, P Schumacher, D Häufle… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-
world tasks. However, influences on the learning performance of RL algorithms are often …

Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless Computing

J Han, X Wei, R Chen, H Chen - Proceedings of the 15th ACM SIGOPS …, 2024 - dl.acm.org
Training a reinforcement learning model is critical for various AI tasks. However, determining
the hardware resources required for training RL models is challenging due to the interaction …

Energy-Based Policy Constraint for Offline Reinforcement Learning

Z Peng, C Han, Y Liu, Z Zhou - CAAI International Conference on Artificial …, 2023 - Springer
Offline RL suffers from the distribution shift problem. One way to address this issue is to
constrain the divergence between the target policy and the behavior policy. However …

[PDF][PDF] Nitro: Boosting Distributed Reinforcement Learning with Serverless Computing

H Yu, J Carter, H Wang, D Tiwari, J Li, SJ Park - intellisys.haow.us
Deep reinforcement learning (DRL) has achieved remarkable success in various fields,
including gaming AI [10, 35, 65, 70], robotics [13, 78], and system scheduling [12, 45, 49, 55] …