Policy distillation- 学术资源搜索

Policy distillation

AA Rusu, SG Colmenarejo, C Gulcehre… - arXiv preprint arXiv …, 2015 - arxiv.org

… policy distillation for transferring one or more action policies … expert policies can be combined
into a single multi-task policy that … process by continually distilling the best policy to a target …

被引用次数：767 相关文章所有 3 个版本

[PDF] mlr.press

Distilling policy distillation

WM Czarnecki, R Pascanu… - The 22nd …, 2019 - proceedings.mlr.press

… entire landscape of policy distillation, … distillation techniques, that are preferred depending
on specifics of the task. Specifically a newly proposed expected entropy regularised distillation …

被引用次数：136 相关文章所有 6 个版本

[PDF] arxiv.org

Dual policy distillation

KH Lai, D Zha, Y Li, X Hu - arXiv preprint arXiv:2006.04061, 2020 - arxiv.org

… peer policy will lead to policy improvement through a view of hypothetical hybrid policy. Then
based on our theoretical results, we present a disadvantageous policy distillation objective …

被引用次数：52 相关文章所有 8 个版本

[PDF] hal.science

Discorl: Continual reinforcement learning via policy distillation

R Traoré, H Caselles-Dupré, T Lesort, T Sun… - NeurIPS Workshop on …, 2019 - hal.science

… Data sampling strategies: We evaluate the effect of two different sampling strategies to create
Dπi for policy distillation. Data sampling is a key component as the sampled dataset should …

被引用次数：65 相关文章所有 5 个版本

相关搜索

Importance prioritized policy distillation

X Qu, YS Ong, A Gupta, P Wei, Z Sun… - Proceedings of the 28th …, 2022 - dl.acm.org

… trained teacher policies [6]. Existing PD approaches mostly treat the student policy training
as … prioritize those important frames in the PD training to enhance the distilled student policy? …

被引用次数：5 相关文章

[PDF] arxiv.org

Policy distillation and value matching in multiagent reinforcement learning

S Wadhwania, DK Kim, S Omidshafiei… - 2019 IEEE/RSJ …, 2019 - ieeexplore.ieee.org

… Combined with policy distillation, we show that DVM enables agents to visit different regions
of the state space, combine the information, and continue learning. In order to perform DVM …

被引用次数：33 相关文章所有 6 个版本

[PDF] neurips.cc

Improving policy learning via language dynamics distillation

V Zhong, J Mu, L Zettlemoyer… - Advances in …, 2022 - proceedings.neurips.cc

… behaves under an expert policy. In the second reinforcement learning phase, we fine-tune
the model through policy learning, while distilling representations from the teacher. This way, …

被引用次数：16 相关文章所有 6 个版本

[PDF] aaai.org

Universal trading for order execution with oracle policy distillation

Y Fang, K Ren, W Liu, D Zhou, W Zhang… - Proceedings of the …, 2021 - ojs.aaai.org

… Then we introduce our policy distillation framework and the corresponding policy optimization
algorithm in details. Without loss of generality, we take liquidation, ie, to sell a specific …

被引用次数：41 相关文章所有 7 个版本

[PDF] arxiv.org

Policy optimization by genetic distillation

T Gangwani, J Peng - arXiv preprint arXiv:1711.01012, 2017 - arxiv.org

… Here, we present Genetic Policy Optimization (GPO), a new genetic algorithm … policy
optimization. GPO uses imitation learning for policy crossover in the state space and applies policy …

被引用次数：39 相关文章所有 8 个版本

[PDF] arxiv.org

Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer

R Traoré, H Caselles-Dupré, T Lesort, T Sun… - arXiv preprint arXiv …, 2019 - arxiv.org

… the distilled policy πDi . With the aggregation of several distillation datasets, we can distill
several policies into … we call a model where policy 1 and policy 2 have been distilled in, πD1,2 . …

被引用次数：37 相关文章所有 5 个版本