Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Statistical learning theory for control: A finite-sample perspective

A Tsiamis, I Ziemann, N Matni… - IEEE Control Systems …, 2023 - ieeexplore.ieee.org
Learning algorithms have become an integral component to modern engineering solutions.
Examples range from self-driving cars and recommender systems to finance and even …

Complexity of Derivative-Free Policy Optimization for Structured Control

X Guo, D Keivan, G Dullerud… - Advances in Neural …, 2024 - proceedings.neurips.cc
The applications of direct policy search in reinforcement learning and continuous control
have received increasing attention. In this work, we present novel theoretical results on the …

Model-free learning with heterogeneous dynamical systems: A federated LQR approach

H Wang, LF Toso, A Mitra, J Anderson - arXiv preprint arXiv:2308.11743, 2023 - arxiv.org
We study a model-free federated linear quadratic regulator (LQR) problem where M agents
with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to …

Global Convergence of Direct Policy Search for State-Feedback Robust Control: A Revisit of Nonsmooth Synthesis with Goldstein Subdifferential

X Guo, B Hu - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
Direct policy search has been widely applied in modern reinforcement learning and
continuous control. However, the theoretical properties of direct policy search on nonsmooth …

Learning the Kalman filter with fine-grained sample complexity

X Zhang, B Hu, T Başar - 2023 American Control Conference …, 2023 - ieeexplore.ieee.org
We develop the first end-to-end sample complexity of model-free policy gradient (PG)
methods in discrete-time infinite-horizon Kalman filtering. Specifically, we introduce the …

Global convergence of policy gradient primal–dual methods for risk-constrained LQRs

F Zhao, K You, T Başar - IEEE Transactions on Automatic …, 2023 - ieeexplore.ieee.org
While the techniques in optimal control theory are often model-based, the policy optimization
(PO) approach directly optimizes the performance metric of interest. Even though it has been …

Revisiting LQR control from the perspective of receding-horizon policy gradient

X Zhang, T Başar - IEEE Control Systems Letters, 2023 - ieeexplore.ieee.org
We revisit in this letter the discrete-time linear quadratic regulator (LQR) problem from the
perspective of receding-horizon policy gradient (RHPG), a newly developed model-free …

Convergence and sample complexity of policy gradient methods for stabilizing linear systems

F Zhao, X Fu, K You - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
System stabilization via policy gradient (PG) methods has drawn increasing attention in both
control and machine learning communities. In this paper, we study their convergence and …

How are policy gradient methods affected by the limits of control?

I Ziemann, A Tsiamis, H Sandberg… - 2022 IEEE 61st …, 2022 - ieeexplore.ieee.org
We study stochastic policy gradient methods from the perspective of control-theoretic
limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle …