Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Revisiting fundamentals of experience replay

W Fedus, P Ramachandran… - International …, 2020 - proceedings.mlr.press
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but
there remain significant gaps in our understanding. We therefore present a systematic and …

Policy Optimization for Linear Control with Robustness Guarantee: Implicit Regularization and Global Convergence

K Zhang, B Hu, T Basar - Learning for Dynamics and Control, 2020 - proceedings.mlr.press
Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For
control design, certain constraints are usually enforced on the policies to optimize …

Physical-informed neural network for MPC-based trajectory tracking of vehicles with noise considered

L Jin, L Liu, X Wang, M Shang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The trajectory tracking plays a vital role in unmanned driving technology. Although
traditional control schemes may yield satisfactory outcomes in dealing with simple linear …

Learning optimal controllers for linear systems with multiplicative noise via policy gradient

B Gravell, PM Esfahani… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical
benchmark for reinforcement learning-based control of complex dynamical systems with …

Policy gradient methods for the noisy linear quadratic regulator over a finite horizon

B Hambly, R Xu, H Yang - SIAM Journal on Control and Optimization, 2021 - SIAM
We explore reinforcement learning methods for finding the optimal policy in the linear
quadratic regulator (LQR) problem. In particular we consider the convergence of policy …

Learning the globally optimal distributed LQ regulator

L Furieri, Y Zheng… - Learning for Dynamics …, 2020 - proceedings.mlr.press
We study model-free learning methods for the output-feedback Linear Quadratic (LQ) control
problem in finite-horizon subject to subspace constraints on the control policy. Subspace …

On the stability and convergence of robust adversarial reinforcement learning: A case study on linear quadratic systems

K Zhang, B Hu, T Basar - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the
simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) …

Complexity of Derivative-Free Policy Optimization for Structured Control

X Guo, D Keivan, G Dullerud… - Advances in Neural …, 2024 - proceedings.neurips.cc
The applications of direct policy search in reinforcement learning and continuous control
have received increasing attention. In this work, we present novel theoretical results on the …

Model-free learning with heterogeneous dynamical systems: A federated LQR approach

H Wang, LF Toso, A Mitra, J Anderson - arXiv preprint arXiv:2308.11743, 2023 - arxiv.org
We study a model-free federated linear quadratic regulator (LQR) problem where M agents
with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to …