Constrained variational policy optimization for safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - International …, 2023 - proceedings.mlr.press

Safe reinforcement learning (RL) trains a constraint satisfaction policy by interacting with the
environment. We aim to tackle a more challenging problem: learning a safe policy from an …

被引用次数：43 相关文章所有 7 个版本

[PDF] arxiv.org

Trustworthy reinforcement learning against intrinsic vulnerabilities: Robustness, safety, and generalizability

M Xu, Z Liu, P Huang, W Ding, Z Cen, B Li… - arXiv preprint arXiv …, 2022 - arxiv.org

A trustworthy reinforcement learning algorithm should be competent in solving challenging
real-world problems, including {robustly} handling uncertainties, satisfying {safety} …

被引用次数：45 相关文章所有 2 个版本

[PDF] neurips.cc

VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning

J Guan, G Chen, J Ji, L Yang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …

被引用次数：7 相关文章所有 4 个版本

[PDF] neurips.cc

Constraint-conditioned policy optimization for versatile safe reinforcement learning

Y Yao, Z Liu, Z Cen, J Zhu, W Yu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to
pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying …

被引用次数：8 相关文章所有 8 个版本

[PDF] thecvf.com

POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning

J Guan, L Shen, A Zhou, L Li, H Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy
both cumulative and state-wise costs from offline datasets. This arrangement provides an …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Datasets and benchmarks for offline safe reinforcement learning

Z Liu, Z Guo, H Lin, Y Yao, J Zhu, Z Cen, H Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents a comprehensive benchmarking suite tailored to offline safe
reinforcement learning (RL) challenges, aiming to foster progress in the development and …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

A Survey of Constraint Formulations in Safe Reinforcement Learning

A Wachi, X Shen, Y Sui - arXiv preprint arXiv:2402.02025, 2024 - arxiv.org

Ensuring safety is critical when applying reinforcement learning (RL) to real-world problems.
Consequently, safe RL emerges as a fundamental and powerful paradigm for safely …

被引用次数：2 相关文章所有 2 个版本

[PDF] neurips.cc

Towards safe reinforcement learning with a safety editor policy

H Yu, W Xu, H Zhang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We consider the safe reinforcement learning (RL) problem of maximizing utility with
extremely low constraint violation rates. Assuming no prior knowledge or pre-training of the …

被引用次数：27 相关文章所有 6 个版本

[PDF] arxiv.org

On the robustness of safe reinforcement learning under observational perturbations

Z Liu, Z Guo, Z Cen, H Zhang, J Tan, B Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Safe reinforcement learning (RL) trains a policy to maximize the task reward while satisfying
safety constraints. While prior works focus on the performance optimality, we find that the …

被引用次数：31 相关文章所有 6 个版本

[PDF] aaai.org

Beyond ood state actions: Supported cross-domain offline reinforcement learning

J Liu, Z Zhang, Z Wei, Z Zhuang, Y Kang… - Proceedings of the …, 2024 - ojs.aaai.org

Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed
data. Although avoiding the time-consuming online interactions in RL, it poses challenges …

被引用次数：10 相关文章所有 3 个版本