Policy optimization in a noisy neighborhood: On return landscapes in continuous control
Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …
instability in their performance over time. In this work, we provide a fresh perspective on …
Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …
Conservative network for offline reinforcement learning
Z Peng, Y Liu, H Chen, Z Zhou - Knowledge-Based Systems, 2023 - Elsevier
Offline reinforcement learning (RL) aims to learn policies from static datasets. The value
overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general …
overestimation of out-of-distribution (OOD) actions makes it difficult to directly apply general …
Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing
Deep reinforcement learning (DRL) has achieved remarkable success in diverse areas,
including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL …
including gaming AI, scientific simulations, and large-scale (HPC) system scheduling. DRL …
Evolutionary strategy guided reinforcement learning via multibuffer communication
Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved
control problems across a variety of domains. Recently, algorithms have been proposed …
control problems across a variety of domains. Recently, algorithms have been proposed …
Stein Variational Evolution Strategies
Stein Variational Gradient Descent (SVGD) is a highly efficient method to sample from an
unnormalized probability distribution. However, the SVGD update relies on gradients of the …
unnormalized probability distribution. However, the SVGD update relies on gradients of the …
Investigating the Impact of Action Representations in Policy Gradient Algorithms
Reinforcement learning~(RL) is a versatile framework for learning to solve complex real-
world tasks. However, influences on the learning performance of RL algorithms are often …
world tasks. However, influences on the learning performance of RL algorithms are often …
Seraph: A Performance-Cost Aware Tuner for Training Reinforcement Learning Model on Serverless Computing
Training a reinforcement learning model is critical for various AI tasks. However, determining
the hardware resources required for training RL models is challenging due to the interaction …
the hardware resources required for training RL models is challenging due to the interaction …
Energy-Based Policy Constraint for Offline Reinforcement Learning
Z Peng, C Han, Y Liu, Z Zhou - CAAI International Conference on Artificial …, 2023 - Springer
Offline RL suffers from the distribution shift problem. One way to address this issue is to
constrain the divergence between the target policy and the behavior policy. However …
constrain the divergence between the target policy and the behavior policy. However …
[PDF][PDF] Nitro: Boosting Distributed Reinforcement Learning with Serverless Computing
Deep reinforcement learning (DRL) has achieved remarkable success in various fields,
including gaming AI [10, 35, 65, 70], robotics [13, 78], and system scheduling [12, 45, 49, 55] …
including gaming AI [10, 35, 65, 70], robotics [13, 78], and system scheduling [12, 45, 49, 55] …