Toward a theoretical foundation of policy optimization for learning control policies
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …
diverse application domains. Recently, there has been a renewed interest in studying …
Revisiting fundamentals of experience replay
W Fedus, P Ramachandran… - International …, 2020 - proceedings.mlr.press
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but
there remain significant gaps in our understanding. We therefore present a systematic and …
there remain significant gaps in our understanding. We therefore present a systematic and …
Policy Optimization for Linear Control with Robustness Guarantee: Implicit Regularization and Global Convergence
Policy optimization (PO) is a key ingredient for modern reinforcement learning (RL). For
control design, certain constraints are usually enforced on the policies to optimize …
control design, certain constraints are usually enforced on the policies to optimize …
Physical-informed neural network for MPC-based trajectory tracking of vehicles with noise considered
The trajectory tracking plays a vital role in unmanned driving technology. Although
traditional control schemes may yield satisfactory outcomes in dealing with simple linear …
traditional control schemes may yield satisfactory outcomes in dealing with simple linear …
Learning optimal controllers for linear systems with multiplicative noise via policy gradient
B Gravell, PM Esfahani… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
The linear quadratic regulator (LQR) problem has reemerged as an important theoretical
benchmark for reinforcement learning-based control of complex dynamical systems with …
benchmark for reinforcement learning-based control of complex dynamical systems with …
Policy gradient methods for the noisy linear quadratic regulator over a finite horizon
We explore reinforcement learning methods for finding the optimal policy in the linear
quadratic regulator (LQR) problem. In particular we consider the convergence of policy …
quadratic regulator (LQR) problem. In particular we consider the convergence of policy …
Learning the globally optimal distributed LQ regulator
We study model-free learning methods for the output-feedback Linear Quadratic (LQ) control
problem in finite-horizon subject to subspace constraints on the control policy. Subspace …
problem in finite-horizon subject to subspace constraints on the control policy. Subspace …
On the stability and convergence of robust adversarial reinforcement learning: A case study on linear quadratic systems
Reinforcement learning (RL) algorithms can fail to generalize due to the gap between the
simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) …
simulation and the real world. One standard remedy is to use robust adversarial RL (RARL) …
Complexity of Derivative-Free Policy Optimization for Structured Control
The applications of direct policy search in reinforcement learning and continuous control
have received increasing attention. In this work, we present novel theoretical results on the …
have received increasing attention. In this work, we present novel theoretical results on the …
Model-free learning with heterogeneous dynamical systems: A federated LQR approach
We study a model-free federated linear quadratic regulator (LQR) problem where M agents
with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to …
with unknown, distinct yet similar dynamics collaboratively learn an optimal policy to …