Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

Vanillanet: the power of minimalism in deep learning

H Chen, Y Wang, J Guo, D Tao - Advances in Neural …, 2024 - proceedings.neurips.cc
At the heart of foundation models is the philosophy of" more is different", exemplified by the
astonishing success in computer vision and natural language processing. However, the …

Vision transformer pruning

M Zhu, Y Tang, K Han - arXiv preprint arXiv:2104.08500, 2021 - arxiv.org
Vision transformer has achieved competitive performance on a variety of computer vision
applications. However, their storage, run-time memory, and computational demands are …

CrossKD: Cross-head knowledge distillation for object detection

J Wang, Y Chen, Z Zheng, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Knowledge Distillation (KD) has been validated as an effective model compression
technique for learning compact object detectors. Existing state-of-the-art KD methods for …

Prune your model before distill it

J Park, A No - European Conference on Computer Vision, 2022 - Springer
Abstract Knowledge distillation transfers the knowledge from a cumbersome teacher to a
small student. Recent results suggest that the student-friendly teacher is more appropriate to …

Undistillable: Making a nasty teacher that cannot teach students

H Ma, T Chen, TK Hu, C You, X Xie, Z Wang - arXiv preprint arXiv …, 2021 - arxiv.org
Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-
trained teacher models to (usually more lightweight) student models. However, in certain …

Comi: Correct and mitigate shortcut learning behavior in deep neural networks

L Zhao, Q Liu, L Yue, W Chen, L Chen, R Sun… - Proceedings of the 47th …, 2024 - dl.acm.org
Deep Neural Networks (DNNs), despite their notable progress across information retrieval
tasks, encounter the issues of shortcut learning and struggle with poor generalization due to …

Generalized knowledge distillation via relationship matching

HJ Ye, S Lu, DC Zhan - IEEE Transactions on Pattern Analysis …, 2022 - ieeexplore.ieee.org
The knowledge of a well-trained deep neural network (aka the “teacher”) is valuable for
learning similar tasks. Knowledge distillation extracts knowledge from the teacher and …

Categories of response-based, feature-based, and relation-based knowledge distillation

C Yang, X Yu, Z An, Y Xu - … Distillation: Towards New Horizons of Intelligent …, 2023 - Springer
Deep neural networks have achieved remarkable performance for artificial intelligence
tasks. The success behind intelligent systems often relies on large-scale models with high …

Function-consistent feature distillation

D Liu, M Kan, S Shan, X Chen - arXiv preprint arXiv:2304.11832, 2023 - arxiv.org
Feature distillation makes the student mimic the intermediate features of the teacher. Nearly
all existing feature-distillation methods use L2 distance or its slight variants as the distance …