Mixed policy gradient

Machine learning applications in the resilience of interdependent critical infrastructure systems—A systematic literature review

BA Alkhaleel - International Journal of Critical Infrastructure Protection, 2023 - Elsevier

The resilience of interdependent critical infrastructure systems (ICISs) is critical for the
functioning of society and the economy. ICISs such as power grids and telecommunication …

被引用次数：10 相关文章所有 4 个版本

[HTML] sciencedirect.com

[HTML][HTML] GOPS: A general optimal control problem solver for autonomous driving and industrial control applications

W Wang, Y Zhang, J Gao, Y Jiang, Y Yang… - Communications in …, 2023 - Elsevier

Solving optimal control problems serves as the basic demand of industrial control tasks.
Existing methods like model predictive control often suffer from heavy online computational …

被引用次数：34 相关文章所有 4 个版本

[PDF] arxiv.org

Integrated decision and control: Toward interpretable and computationally efficient driving intelligence

Y Guan, Y Ren, Q Sun, SE Li, H Ma… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Decision and control are core functionalities of high-level automated vehicles. Current
mainstream methods, such as functional decomposition and end-to-end reinforcement …

被引用次数：69 相关文章所有 9 个版本

[PDF] neurips.cc

Aligning language models with human preferences via a bayesian approach

J Wang, H Wang, S Sun, W Li - Advances in Neural …, 2024 - proceedings.neurips.cc

In the quest to advance human-centric natural language generation (NLG) systems,
ensuring alignment between NLG models and human preferences is crucial. For this …

被引用次数：9 相关文章所有 5 个版本

[PDF] researchgate.net

Policy iteration based approximate dynamic programming toward autonomous driving in constrained dynamic environment

Z Lin, J Ma, J Duan, SE Li, H Ma… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In the area of autonomous driving, it typically brings great difficulty in solving the motion
planning problem since the vehicle model is nonlinear and the driving scenarios are …

被引用次数：20 相关文章所有 4 个版本

[PDF] neurips.cc

Stochastic second-order methods improve best-known sample complexity of SGD for gradient-dominated functions

S Masiha, S Salehkaleybar, N He… - Advances in …, 2022 - proceedings.neurips.cc

We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of
functions satisfying gradient dominance property with $1\le\alpha\le2 $ which holds in a …

被引用次数：20 相关文章所有 9 个版本

[PDF] researchgate.net

Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

H Ma, C Liu, SE Li, S Zheng, W Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

We focus on learning the zero-constraint-violation safe policy in model-free reinforcement
learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Beyond exact gradients: Convergence of stochastic soft-max policy gradient methods with entropy regularization

Y Ding, J Zhang, J Lavaei - arXiv preprint arXiv:2110.10117, 2021 - arxiv.org

Entropy regularization is an efficient technique for encouraging exploration and preventing a
premature convergence of (vanilla) policy gradient methods in reinforcement learning (RL) …

被引用次数：17 相关文章所有 5 个版本

[PDF] arxiv.org

Bregman gradient policy optimization

F Huang, S Gao, H Huang - arXiv preprint arXiv:2106.12112, 2021 - arxiv.org

In the paper, we design a novel Bregman gradient policy optimization framework for
reinforcement learning based on Bregman divergences and momentum techniques …

被引用次数：21 相关文章所有 4 个版本

[PDF] arxiv.org

Improve generalization of driving policy at signalized intersections with adversarial learning

Y Ren, G Zhan, L Tang, SE Li, J Jiang, K Li… - … research part C …, 2023 - Elsevier

Intersections are quite challenging among various driving scenes wherein the interaction of
signal lights and distinct traffic actors poses great difficulty to learn a wise and robust driving …

被引用次数：7 相关文章所有 7 个版本