Policy optimization with demonstrations- 学术资源搜索

Policy optimization with demonstrations

B Kang, Z Jie, J Feng - International conference on machine …, 2018 - proceedings.mlr.press

… Policy Optimization from Demonstration (POfD) method, which can acquire knowledge from
demonstration … 1) We reformulate the policy optimization objective by adding a demonstration…

被引用次数：184 相关文章所有 4 个版本

[PDF] neurips.cc

Hybrid policy optimization from Imperfect demonstrations

H Yang, C Yu, S Chen - Advances in Neural Information …, 2024 - proceedings.neurips.cc

… HYbrid Policy Optimization (HYPO), which uses a small number of imperfect demonstrations
… The key idea is to train an offline guider policy using imitation learning in order to instruct an …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Guarded policy optimization with imperfect online demonstrations

Z Xue, Z Peng, Q Li, Z Liu, B Zhou - arXiv preprint arXiv:2303.01728, 2023 - arxiv.org

… In this work we develop a new guarded policy optimization … In contrast to the offline learning
from demonstrations, in this work we focus on the online deployment of teacher policies with …

被引用次数：8 相关文章所有 4 个版本

[PDF] mlr.press

Guided exploration with proximal policy optimization using a single demonstration

G Libardi, G De Fabritiis… - … Conference on Machine …, 2021 - proceedings.mlr.press

… the demonstrations … the policy only specifies a distribution over the action space. We force
the actions of the policy to equal the demonstration actions instead of sampling from the policy …

被引用次数：24 相关文章所有 5 个版本

[PDF] arxiv.org

Policy Optimization with Smooth Guidance Rewards Learned from Sparse-Reward Demonstrations

G Wang, F Wu, X Zhang, T Chen - arXiv preprint arXiv:2401.00162, 2023 - arxiv.org

… Policy Optimization with Smooth Guidance (POSG) that leverages a small set of sparse-reward
demonstrations to … be indirectly estimated using offline demonstrations rather than directly …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

Iterative Regularized Policy Optimization with Imperfect Demonstrations

G Xudong, F Dawei, K Xu, Y Zhai, C Yao… - … on Machine Learning, 2024 - openreview.net

… introduce the Iterative Regularized Policy Optimization (IRPO) method … demonstration
boosting to enhance demonstration quality. Specifically, iterative training capitalizes on the policy …

[PDF] mlr.press

Learning from demonstration: Provably efficient adversarial policy imitation with linear function approximation

Z Liu, Y Zhang, Z Fu, Z Yang… - … conference on machine …, 2022 - proceedings.mlr.press

… 2021) analyze the constrained pessimistic policy optimization with general function
approximation and with the partial coverage assumption of the dataset, then they specialize the case …

被引用次数：20 相关文章所有 5 个版本

[PDF] neurips.cc

Learning from limited demonstrations

B Kim, A Farahmand, J Pineau… - Advances in Neural …, 2013 - proceedings.neurips.cc

… constraints which guide the optimization performed by Approximate Policy Iteration. We …
expert demonstrations. In all cases, we compare our algorithm with LeastSquare Policy Iteration (…

被引用次数：145 相关文章所有 13 个版本

[PDF] frontiersin.org

Generalize robot learning from demonstration to variant scenarios with evolutionary policy gradient

J Cao, W Liu, Y Liu, J Yang - Frontiers in Neurorobotics, 2020 - frontiersin.org

… Our EPG algorithm is derived and approximated from the optimization of perturbed policies
with policy gradient methods, by adding some heuristic of EAs. As EAs differ in how to …

被引用次数：4 相关文章所有 9 个版本

[PDF] mlr.press

Safe driving via expert guided policy optimization

Z Peng, Q Li, C Liu, B Zhou - Conference on Robot Learning, 2022 - proceedings.mlr.press

… demonstration as two training sources. Following such a setting, we develop a novel Expert
Guided Policy Optimization (… of an expert policy to generate demonstration and a switch …

被引用次数：39 相关文章所有 5 个版本