Residual distillation: Towards portable deep neural networks without shortcuts

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：2927 相关文章所有 12 个版本

[PDF] neurips.cc

Vanillanet: the power of minimalism in deep learning

H Chen, Y Wang, J Guo, D Tao - Advances in Neural …, 2024 - proceedings.neurips.cc

At the heart of foundation models is the philosophy of" more is different", exemplified by the
astonishing success in computer vision and natural language processing. However, the …

被引用次数：100 相关文章所有 6 个版本

[PDF] arxiv.org

Vision transformer pruning

M Zhu, Y Tang, K Han - arXiv preprint arXiv:2104.08500, 2021 - arxiv.org

Vision transformer has achieved competitive performance on a variety of computer vision
applications. However, their storage, run-time memory, and computational demands are …

被引用次数：104 相关文章所有 2 个版本

[PDF] thecvf.com

CrossKD: Cross-head knowledge distillation for object detection

J Wang, Y Chen, Z Zheng, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Knowledge Distillation (KD) has been validated as an effective model compression
technique for learning compact object detectors. Existing state-of-the-art KD methods for …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Prune your model before distill it

J Park, A No - European Conference on Computer Vision, 2022 - Springer

Abstract Knowledge distillation transfers the knowledge from a cumbersome teacher to a
small student. Recent results suggest that the student-friendly teacher is more appropriate to …

被引用次数：31 相关文章所有 7 个版本

[PDF] arxiv.org

Undistillable: Making a nasty teacher that cannot teach students

H Ma, T Chen, TK Hu, C You, X Xie, Z Wang - arXiv preprint arXiv …, 2021 - arxiv.org

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-
trained teacher models to (usually more lightweight) student models. However, in certain …

被引用次数：47 相关文章所有 8 个版本

Comi: Correct and mitigate shortcut learning behavior in deep neural networks

L Zhao, Q Liu, L Yue, W Chen, L Chen, R Sun… - Proceedings of the 47th …, 2024 - dl.acm.org

Deep Neural Networks (DNNs), despite their notable progress across information retrieval
tasks, encounter the issues of shortcut learning and struggle with poor generalization due to …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Generalized knowledge distillation via relationship matching

HJ Ye, S Lu, DC Zhan - IEEE Transactions on Pattern Analysis …, 2022 - ieeexplore.ieee.org

The knowledge of a well-trained deep neural network (aka the “teacher”) is valuable for
learning similar tasks. Knowledge distillation extracts knowledge from the teacher and …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Categories of response-based, feature-based, and relation-based knowledge distillation

C Yang, X Yu, Z An, Y Xu - … Distillation: Towards New Horizons of Intelligent …, 2023 - Springer

Deep neural networks have achieved remarkable performance for artificial intelligence
tasks. The success behind intelligent systems often relies on large-scale models with high …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

Function-consistent feature distillation

D Liu, M Kan, S Shan, X Chen - arXiv preprint arXiv:2304.11832, 2023 - arxiv.org

Feature distillation makes the student mimic the intermediate features of the teacher. Nearly
all existing feature-distillation methods use L2 distance or its slight variants as the distance …

被引用次数：19 相关文章所有 3 个版本