Safe reinforcement learning with stability guarantee for motion planning of autonomous vehicles
IEEE transactions on neural networks and learning systems, 2021•ieeexplore.ieee.org
Reinforcement learning with safety constraints is promising for autonomous vehicles, of
which various failures may result in disastrous losses. In general, a safe policy is trained by
constrained optimization algorithms, in which the average constraint return as a function of
states and actions should be lower than a predefined bound. However, most existing safe
learning-based algorithms capture states via multiple high-precision sensors, which
complicates the hardware systems and is power-consuming. This article is focused on safe …
which various failures may result in disastrous losses. In general, a safe policy is trained by
constrained optimization algorithms, in which the average constraint return as a function of
states and actions should be lower than a predefined bound. However, most existing safe
learning-based algorithms capture states via multiple high-precision sensors, which
complicates the hardware systems and is power-consuming. This article is focused on safe …
Reinforcement learning with safety constraints is promising for autonomous vehicles, of which various failures may result in disastrous losses. In general, a safe policy is trained by constrained optimization algorithms, in which the average constraint return as a function of states and actions should be lower than a predefined bound. However, most existing safe learning-based algorithms capture states via multiple high-precision sensors, which complicates the hardware systems and is power-consuming. This article is focused on safe motion planning with the stability guarantee for autonomous vehicles with limited size and power. To this end, the risk-identification method and the Lyapunov function are integrated with the well-known soft actor–critic (SAC) algorithm. By borrowing the concept of Lyapunov functions in the control theory, the learned policy can theoretically guarantee that the state trajectory always stays in a safe area. A novel risk-sensitive learning-based algorithm with the stability guarantee is proposed to train policies for the motion planning of autonomous vehicles. The learned policy is implemented on a differential drive vehicle in a simulation environment. The experimental results show that the proposed algorithm achieves a higher success rate than the SAC.
ieeexplore.ieee.org