Distilling with performance enhanced students
arXiv preprint arXiv:1810.10460, 2018•arxiv.org
The task of accelerating large neural networks on general purpose hardware has, in recent
years, prompted the use of channel pruning to reduce network size. However, the efficacy of
pruning based approaches has since been called into question. In this paper, we turn to
distillation for model compression---specifically, attention transfer---and develop a simple
method for discovering performance enhanced student networks. We combine channel
saliency metrics with empirical observations of runtime performance to design more …
years, prompted the use of channel pruning to reduce network size. However, the efficacy of
pruning based approaches has since been called into question. In this paper, we turn to
distillation for model compression---specifically, attention transfer---and develop a simple
method for discovering performance enhanced student networks. We combine channel
saliency metrics with empirical observations of runtime performance to design more …
The task of accelerating large neural networks on general purpose hardware has, in recent years, prompted the use of channel pruning to reduce network size. However, the efficacy of pruning based approaches has since been called into question. In this paper, we turn to distillation for model compression---specifically, attention transfer---and develop a simple method for discovering performance enhanced student networks. We combine channel saliency metrics with empirical observations of runtime performance to design more accurate networks for a given latency budget. We apply our methodology to residual and densely-connected networks, and show that we are able to find resource-efficient student networks on different hardware platforms while maintaining very high accuracy. These performance-enhanced student networks achieve up to 10% boosts in top-1 ImageNet accuracy over their channel-pruned counterparts for the same inference time.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果