[PDF][PDF] 知识蒸馏研究综述

黄震华, 杨顺志, 林威, 倪娟, 孙圣力, 陈运文, 汤庸 - 计算机学报, 2022 - 159.226.43.17
摘要高性能的深度学习网络通常是计算型和参数密集型的, 难以应用于资源受限的边缘设备.
为了能够在低资源设备上运行深度学习模型, 需要研发高效的小规模网络 …

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Patient knowledge distillation for bert model compression

S Sun, Y Cheng, Z Gan, J Liu - arXiv preprint arXiv:1908.09355, 2019 - arxiv.org
Pre-trained language models such as BERT have proven to be highly effective for natural
language processing (NLP) tasks. However, the high demand for computing resources in …

Knowledge distillation via instance relationship graph

Y Liu, J Cao, B Li, C Yuan, W Hu… - Proceedings of the …, 2019 - openaccess.thecvf.com
The key challenge of knowledge distillation is to extract general, moderate and sufficient
knowledge from a teacher network to guide a student network. In this paper, a novel …

Recent progresses in deep learning based acoustic models

D Yu, J Li - IEEE/CAA Journal of automatica sinica, 2017 - ieeexplore.ieee.org
In this paper, we summarize recent progresses made in deep learning based acoustic
models and the motivation and insights behind the surveyed techniques. We first discuss …

Towards model compression for deep learning based speech enhancement

K Tan, DL Wang - IEEE/ACM transactions on audio, speech …, 2021 - ieeexplore.ieee.org
The use of deep neural networks (DNNs) has dramatically elevated the performance of
speech enhancement over the last decade. However, to achieve strong enhancement …

Large-scale domain adaptation via teacher-student learning

J Li, ML Seltzer, X Wang, R Zhao, Y Gong - arXiv preprint arXiv …, 2017 - arxiv.org
High accuracy speech recognition requires a large amount of transcribed data for
supervised training. In the absence of such data, domain adaptation of a well-trained …

Conditional teacher-student learning

Z Meng, J Li, Y Zhao, Y Gong - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
The teacher-student (T/S) learning has been shown to be effective for a variety of problems
such as domain adaptation and model compression. One shortcoming of the T/S learning is …

Lessons from building acoustic models with a million hours of speech

SHK Parthasarathi, N Strom - ICASSP 2019-2019 IEEE …, 2019 - ieeexplore.ieee.org
This is a report of our lessons learned building acoustic models from 1 Million hours of
unlabeled speech, while labeled speech is restricted to 7,000 hours. We employ …

Efficient knowledge distillation for rnn-transducer models

S Panchapagesan, DS Park, CC Chiu… - ICASSP 2021-2021 …, 2021 - ieeexplore.ieee.org
Knowledge Distillation is an effective method of transferring knowledge from a large model
to a smaller model. Distillation can be viewed as a type of model compression, and has …