Convergence and sample complexity of gradient methods for the model-free linear–quadratic...

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org

Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

被引用次数：72 相关文章所有 6 个版本

[PDF] arxiv.org

From bypass transition to flow control and data-driven turbulence modeling: an input–output viewpoint

MR Jovanović - Annual Review of Fluid Mechanics, 2021 - annualreviews.org

Transient growth and resolvent analyses are routinely used to assess nonasymptotic
properties of fluid flows. In particular, resolvent analysis can be interpreted as a special case …

被引用次数：111 相关文章所有 6 个版本

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

被引用次数：207 相关文章所有 8 个版本

[PDF] informs.org

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

被引用次数：216 相关文章所有 15 个版本

[PDF] mlr.press

On the role of attention in prompt-tuning

S Oymak, AS Rawat, M Soltanolkotabi… - International …, 2023 - proceedings.mlr.press

Prompt-tuning is an emerging strategy to adapt large language models (LLM) to
downstream tasks by learning a (soft-) prompt parameter from data. Despite its success in …

被引用次数：46 相关文章所有 9 个版本

[PDF] mlr.press

Policy Optimization for Linear Control with Robustness Guarantee: Implicit Regularization and Global Convergence

K Zhang, B Hu, T Basar - Learning for Dynamics and Control, 2020 - proceedings.mlr.press

Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For
control design, certain constraints are usually enforced on the policies to optimize …

被引用次数：123 相关文章所有 7 个版本

[PDF] arxiv.org

Optimizing static linear feedback: Gradient method

I Fatkhullin, B Polyak - SIAM Journal on Control and Optimization, 2021 - SIAM

The linear quadratic regulator is the fundamental problem of optimal control. Its state
feedback version was set and solved in the early 1960s. However, the static output feedback …

被引用次数：99 相关文章所有 8 个版本

[PDF] arxiv.org

Distributed reinforcement learning for decentralized linear quadratic control: A derivative-free policy optimization approach

Y Li, Y Tang, R Zhang, N Li - IEEE Transactions on Automatic …, 2021 - ieeexplore.ieee.org

This article considers a distributed reinforcement learning problem for decentralized linear
quadratic (LQ) control with partial state observations and local costs. We propose a zero …

被引用次数：106 相关文章所有 5 个版本

[PDF] mlr.press

Sample complexity of linear quadratic gaussian (LQG) control for output feedback systems

Y Zheng, L Furieri, M Kamgarpour… - Learning for dynamics …, 2021 - proceedings.mlr.press

This paper studies a class of partially observed Linear Quadratic Gaussian (LQG) problems
with unknown dynamics. We establish an end-to-end sample complexity bound on learning …

被引用次数：63 相关文章所有 7 个版本

[PDF] mlr.press

Softmax policy gradient methods can take exponential time to converge

G Li, Y Wei, Y Chi, Y Gu… - Conference on Learning …, 2021 - proceedings.mlr.press

The softmax policy gradient (PG) method, which performs gradient ascent under softmax
policy parameterization, is arguably one of the de facto implementations of policy …

被引用次数：53 相关文章所有 15 个版本