Policy distillation

AA Rusu, SG Colmenarejo, C Gulcehre… - arXiv preprint arXiv …, 2015 - arxiv.org
policy distillation for transferring one or more action policies … expert policies can be combined
into a single multi-task policy that … process by continually distilling the best policy to a target …

Distilling policy distillation

WM Czarnecki, R Pascanu… - The 22nd …, 2019 - proceedings.mlr.press
… entire landscape of policy distillation, … distillation techniques, that are preferred depending
on specifics of the task. Specifically a newly proposed expected entropy regularised distillation

Dual policy distillation

KH Lai, D Zha, Y Li, X Hu - arXiv preprint arXiv:2006.04061, 2020 - arxiv.org
… peer policy will lead to policy improvement through a view of hypothetical hybrid policy. Then
based on our theoretical results, we present a disadvantageous policy distillation objective …

Discorl: Continual reinforcement learning via policy distillation

R Traoré, H Caselles-Dupré, T Lesort, T Sun… - NeurIPS Workshop on …, 2019 - hal.science
… Data sampling strategies: We evaluate the effect of two different sampling strategies to create
Dπi for policy distillation. Data sampling is a key component as the sampled dataset should …

Importance prioritized policy distillation

X Qu, YS Ong, A Gupta, P Wei, Z Sun… - Proceedings of the 28th …, 2022 - dl.acm.org
… trained teacher policies [6]. Existing PD approaches mostly treat the student policy training
as … prioritize those important frames in the PD training to enhance the distilled student policy? …

Policy distillation and value matching in multiagent reinforcement learning

S Wadhwania, DK Kim, S Omidshafiei… - 2019 IEEE/RSJ …, 2019 - ieeexplore.ieee.org
… Combined with policy distillation, we show that DVM enables agents to visit different regions
of the state space, combine the information, and continue learning. In order to perform DVM …

Improving policy learning via language dynamics distillation

V Zhong, J Mu, L Zettlemoyer… - Advances in …, 2022 - proceedings.neurips.cc
… behaves under an expert policy. In the second reinforcement learning phase, we fine-tune
the model through policy learning, while distilling representations from the teacher. This way, …

Universal trading for order execution with oracle policy distillation

Y Fang, K Ren, W Liu, D Zhou, W Zhang… - Proceedings of the …, 2021 - ojs.aaai.org
… Then we introduce our policy distillation framework and the corresponding policy optimization
algorithm in details. Without loss of generality, we take liquidation, ie, to sell a specific …

Policy optimization by genetic distillation

T Gangwani, J Peng - arXiv preprint arXiv:1711.01012, 2017 - arxiv.org
… Here, we present Genetic Policy Optimization (GPO), a new genetic algorithm … policy
optimization. GPO uses imitation learning for policy crossover in the state space and applies policy

Continual reinforcement learning deployed in real-life using policy distillation and sim2real transfer

R Traoré, H Caselles-Dupré, T Lesort, T Sun… - arXiv preprint arXiv …, 2019 - arxiv.org
… the distilled policy πDi . With the aggregation of several distillation datasets, we can distill
several policies into … we call a model where policy 1 and policy 2 have been distilled in, πD1,2 . …