VOCE: Variational optimization with conservative estimation for offline safe reinforcement learning
Offline safe reinforcement learning (RL) algorithms promise to learn policies that satisfy
safety constraints directly in offline datasets without interacting with the environment. This …
safety constraints directly in offline datasets without interacting with the environment. This …
Apigen: Automated pipeline for generating verifiable and diverse function-calling datasets
The advancement of function-calling agent models requires diverse, reliable, and high-
quality datasets. This paper presents APIGen, an automated data generation pipeline …
quality datasets. This paper presents APIGen, an automated data generation pipeline …
Constraint-conditioned policy optimization for versatile safe reinforcement learning
Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to
pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying …
pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying …
POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning
Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy
both cumulative and state-wise costs from offline datasets. This arrangement provides an …
both cumulative and state-wise costs from offline datasets. This arrangement provides an …
Survival instinct in offline reinforcement learning
We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …
Safe offline reinforcement learning with feasibility-guided diffusion model
Safe offline RL is a promising way to bypass risky online interactions towards safe policy
learning. Most existing methods only enforce soft constraints, ie, constraining safety …
learning. Most existing methods only enforce soft constraints, ie, constraining safety …
A Survey of Constraint Formulations in Safe Reinforcement Learning
Ensuring safety is critical when applying reinforcement learning (RL) to real-world problems.
Consequently, safe RL emerges as a fundamental and powerful paradigm for safely …
Consequently, safe RL emerges as a fundamental and powerful paradigm for safely …
A primal-dual-critic algorithm for offline constrained reinforcement learning
Offline constrained reinforcement learning (RL) aims to learn a policy that maximizes the
expected cumulative reward subject to constraints on expected cumulative cost using an …
expected cumulative reward subject to constraints on expected cumulative cost using an …
Safety-aware causal representation for trustworthy offline reinforcement learning in autonomous driving
In the domain of autonomous driving, the offline Reinforcement Learning (RL) approaches
exhibit notable efficacy in addressing sequential decision-making problems from offline …
exhibit notable efficacy in addressing sequential decision-making problems from offline …
Oasis: Conditional distribution shaping for offline safe reinforcement learning
Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using
a pre-collected dataset. Most current methods struggle with the mismatch between imperfect …
a pre-collected dataset. Most current methods struggle with the mismatch between imperfect …