Policy optimization with demonstrations
… Policy Optimization from Demonstration (POfD) method, which can acquire knowledge from
demonstration … 1) We reformulate the policy optimization objective by adding a demonstration…
demonstration … 1) We reformulate the policy optimization objective by adding a demonstration…
Hybrid policy optimization from Imperfect demonstrations
H Yang, C Yu, S Chen - Advances in Neural Information …, 2024 - proceedings.neurips.cc
… HYbrid Policy Optimization (HYPO), which uses a small number of imperfect demonstrations
… The key idea is to train an offline guider policy using imitation learning in order to instruct an …
… The key idea is to train an offline guider policy using imitation learning in order to instruct an …
Guided exploration with proximal policy optimization using a single demonstration
G Libardi, G De Fabritiis… - … Conference on Machine …, 2021 - proceedings.mlr.press
… the demonstrations … the policy only specifies a distribution over the action space. We force
the actions of the policy to equal the demonstration actions instead of sampling from the policy …
the actions of the policy to equal the demonstration actions instead of sampling from the policy …
Guarded policy optimization with imperfect online demonstrations
… In this work we develop a new guarded policy optimization … In contrast to the offline learning
from demonstrations, in this work we focus on the online deployment of teacher policies with …
from demonstrations, in this work we focus on the online deployment of teacher policies with …
Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations
G Wang, F Wu, X Zhang, T Chen - arXiv preprint arXiv:2401.00162, 2023 - arxiv.org
… Policy Optimization with Smooth Guidance (POSG) that leverages a small set of sparse-reward
demonstrations to … be indirectly estimated using offline demonstrations rather than directly …
demonstrations to … be indirectly estimated using offline demonstrations rather than directly …
Iterative Regularized Policy Optimization with Imperfect Demonstrations
G Xudong, F Dawei, K Xu, Y Zhai, C Yao… - … on Machine Learning, 2024 - openreview.net
… introduce the Iterative Regularized Policy Optimization (IRPO) method … demonstration
boosting to enhance demonstration quality. Specifically, iterative training capitalizes on the policy …
boosting to enhance demonstration quality. Specifically, iterative training capitalizes on the policy …
Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation
… 2021) analyze the constrained pessimistic policy optimization with general function
approximation and with the partial coverage assumption of the dataset, then they specialize the case …
approximation and with the partial coverage assumption of the dataset, then they specialize the case …
Learning from limited demonstrations
… constraints which guide the optimization performed by Approximate Policy Iteration. We …
expert demonstrations. In all cases, we compare our algorithm with LeastSquare Policy Iteration (…
expert demonstrations. In all cases, we compare our algorithm with LeastSquare Policy Iteration (…
Diffusion policy policy optimization
… On the contrary, we show that for a Diffusion Policy pre-trained from expert
demonstrations, our methodology for fine-tuning via PG updates yields robust, high-performing …
demonstrations, our methodology for fine-tuning via PG updates yields robust, high-performing …
Generalize robot learning from demonstration to variant scenarios with evolutionary policy gradient
… Our EPG algorithm is derived and approximated from the optimization of perturbed policies
with policy gradient methods, by adding some heuristic of EAs. As EAs differ in how to …
with policy gradient methods, by adding some heuristic of EAs. As EAs differ in how to …
相关搜索
- policy optimization imperfect demonstrations
- single demonstration proximal policy optimization
- multi-objective policy optimization
- constrained policy optimization
- offline policy optimization
- policy optimization sparse rewards
- policy optimization regularization matters
- policy optimization conservative exploration
- policy optimization linear function approximation
- policy gradient robust optimization
- guided exploration proximal policy optimization