Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

From bypass transition to flow control and data-driven turbulence modeling: an input–output viewpoint

MR Jovanović - Annual Review of Fluid Mechanics, 2021 - annualreviews.org
Transient growth and resolvent analyses are routinely used to assess nonasymptotic
properties of fluid flows. In particular, resolvent analysis can be interpreted as a special case …

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

On the role of attention in prompt-tuning

S Oymak, AS Rawat, M Soltanolkotabi… - International …, 2023 - proceedings.mlr.press
Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …

Policy Optimization for Linear Control with Robustness Guarantee: Implicit Regularization and Global Convergence

K Zhang, B Hu, T Basar - Learning for Dynamics and Control, 2020 - proceedings.mlr.press
Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For
control design, certain constraints are usually enforced on the policies to optimize …

Optimizing static linear feedback: Gradient method

I Fatkhullin, B Polyak - SIAM Journal on Control and Optimization, 2021 - SIAM
The linear quadratic regulator is the fundamental problem of optimal control. Its state
feedback version was set and solved in the early 1960s. However, the static output feedback …

Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach

Y Li, Y Tang, R Zhang, N Li - IEEE Transactions on Automatic …, 2021 - ieeexplore.ieee.org
This article considers a distributed reinforcement learning problem for decentralized linear
quadratic (LQ) control with partial state observations and local costs. We propose a zero …

Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems

Y Zheng, L Furieri, M Kamgarpour… - Learning for dynamics …, 2021 - proceedings.mlr.press
This paper studies a class of partially observed Linear Quadratic Gaussian (LQG) problems
with unknown dynamics. We establish an end-to-end sample complexity bound on learning …

Softmax policy gradient methods can take exponential time to converge

G Li, Y Wei, Y Chi, Y Gu… - Conference on Learning …, 2021 - proceedings.mlr.press
The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …