On first-order meta-reinforcement learning with moreau envelopes
MT Toghani, S Perez-Salazar… - 2023 62nd IEEE …, 2023 - ieeexplore.ieee.org
Meta-Reinforcement Learning (MRL) is a promising framework for training agents that can
quickly adapt to new environments and tasks. In this work, we study the MRL problem under
the policy gradient formulation, where we propose a novel algorithm that uses Moreau
envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the
environment of each individual task. Our algorithm, called Moreau Envelope Meta-
Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of …
quickly adapt to new environments and tasks. In this work, we study the MRL problem under
the policy gradient formulation, where we propose a novel algorithm that uses Moreau
envelope surrogate regularizers to jointly learn a meta-policy that is adjustable to the
environment of each individual task. Our algorithm, called Moreau Envelope Meta-
Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of …
On First-Order Meta-Reinforcement Learning with Moreau Envelopes
M Taha Toghani, S Perez-Salazar, CA Uribe - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Abstract Meta-Reinforcement Learning (MRL) is a promising framework for training agents
that can quickly adapt to new environments and tasks. In this work, we study the MRL
problem under the policy gradient formulation, where we propose a novel algorithm that
uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable
to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-
Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of …
that can quickly adapt to new environments and tasks. In this work, we study the MRL
problem under the policy gradient formulation, where we propose a novel algorithm that
uses Moreau envelope surrogate regularizers to jointly learn a meta-policy that is adjustable
to the environment of each individual task. Our algorithm, called Moreau Envelope Meta-
Reinforcement Learning (MEMRL), learns a meta-policy that can adapt to a distribution of …
以上显示的是最相近的搜索结果。 查看全部搜索结果