Knowledge distillation: A survey
In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …
especially for computer vision tasks. The great success of deep learning is mainly due to its …
Vanillanet: the power of minimalism in deep learning
At the heart of foundation models is the philosophy of" more is different", exemplified by the
astonishing success in computer vision and natural language processing. However, the …
astonishing success in computer vision and natural language processing. However, the …
CrossKD: Cross-head knowledge distillation for object detection
Abstract Knowledge Distillation (KD) has been validated as an effective model compression
technique for learning compact object detectors. Existing state-of-the-art KD methods for …
technique for learning compact object detectors. Existing state-of-the-art KD methods for …
Prune your model before distill it
J Park, A No - European Conference on Computer Vision, 2022 - Springer
Abstract Knowledge distillation transfers the knowledge from a cumbersome teacher to a
small student. Recent results suggest that the student-friendly teacher is more appropriate to …
small student. Recent results suggest that the student-friendly teacher is more appropriate to …
Undistillable: Making a nasty teacher that cannot teach students
Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-
trained teacher models to (usually more lightweight) student models. However, in certain …
trained teacher models to (usually more lightweight) student models. However, in certain …
Comi: Correct and mitigate shortcut learning behavior in deep neural networks
Deep Neural Networks (DNNs), despite their notable progress across information retrieval
tasks, encounter the issues of shortcut learning and struggle with poor generalization due to …
tasks, encounter the issues of shortcut learning and struggle with poor generalization due to …
Generalized knowledge distillation via relationship matching
The knowledge of a well-trained deep neural network (aka the “teacher”) is valuable for
learning similar tasks. Knowledge distillation extracts knowledge from the teacher and …
learning similar tasks. Knowledge distillation extracts knowledge from the teacher and …
Categories of response-based, feature-based, and relation-based knowledge distillation
Deep neural networks have achieved remarkable performance for artificial intelligence
tasks. The success behind intelligent systems often relies on large-scale models with high …
tasks. The success behind intelligent systems often relies on large-scale models with high …
Function-consistent feature distillation
Feature distillation makes the student mimic the intermediate features of the teacher. Nearly
all existing feature-distillation methods use L2 distance or its slight variants as the distance …
all existing feature-distillation methods use L2 distance or its slight variants as the distance …