Distilling with performance enhanced students- 学术资源搜索

Distilling with performance enhanced students

J Turner, EJ Crowley, V Radu, J Cano… - arXiv preprint arXiv …, 2018 - arxiv.org

J Turner, EJ Crowley, V Radu, J Cano, A Storkey, M O'Boyle

arXiv preprint arXiv:1810.10460, 2018•arxiv.org

The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into question. In this paper, we turn to distillation for model compression---specifically, attention transfer---and develop a simple method for discovering performance enhanced student networks. We combine channel saliency metrics with empirical observations of runtime performance to design more accurate networks for a given latency budget. We apply our methodology to residual and densely-connected networks, and show that we are able to find resource-efficient student networks on different hardware platforms while maintaining very high accuracy. These performance-enhanced student networks achieve up to 10% boosts in top-1 ImageNet accuracy over their channel-pruned counterparts for the same inference time.

arxiv.org

展开收起

被引用次数：10 相关文章所有 7 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果