Toward a theoretical foundation of policy optimization for learning control policies
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …
diverse application domains. Recently, there has been a renewed interest in studying …
From bypass transition to flow control and data-driven turbulence modeling: an input–output viewpoint
MR Jovanović - Annual Review of Fluid Mechanics, 2021 - annualreviews.org
Transient growth and resolvent analyses are routinely used to assess nonasymptotic
properties of fluid flows. In particular, resolvent analysis can be interpreted as a special case …
properties of fluid flows. In particular, resolvent analysis can be interpreted as a special case …
Natural policy gradient primal-dual method for constrained markov decision processes
We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …
expected total reward while satisfying a constraint on the expected total utility. We employ …
Fast global convergence of natural policy gradient methods with entropy regularization
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …
algorithms in contemporary reinforcement learning. This class of methods is often applied in …
On the role of attention in prompt-tuning
Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …
Policy Optimization for Linear Control with Robustness Guarantee: Implicit Regularization and Global Convergence
Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For
control design, certain constraints are usually enforced on the policies to optimize …
control design, certain constraints are usually enforced on the policies to optimize …
Optimizing static linear feedback: Gradient method
I Fatkhullin, B Polyak - SIAM Journal on Control and Optimization, 2021 - SIAM
The linear quadratic regulator is the fundamental problem of optimal control. Its state
feedback version was set and solved in the early 1960s. However, the static output feedback …
feedback version was set and solved in the early 1960s. However, the static output feedback …
Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach
This article considers a distributed reinforcement learning problem for decentralized linear
quadratic (LQ) control with partial state observations and local costs. We propose a zero …
quadratic (LQ) control with partial state observations and local costs. We propose a zero …
Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems
This paper studies a class of partially observed Linear Quadratic Gaussian (LQG) problems
with unknown dynamics. We establish an end-to-end sample complexity bound on learning …
with unknown dynamics. We establish an end-to-end sample complexity bound on learning …
Softmax policy gradient methods can take exponential time to converge
The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …
policy parameterization, is arguably one of the de facto implementations of policy …