Self-distillation amplifies regularization in hilbert space

Knowledge distillation and student-teacher learning for visual intelligence: A review and new outlooks

L Wang, KJ Yoon - IEEE transactions on pattern analysis and …, 2021 - ieeexplore.ieee.org

Deep neural models, in recent years, have been successful in almost every field, even
solving the most complex problem statements. However, these models are huge in size with …

被引用次数：775 相关文章所有 10 个版本

[PDF] arxiv.org

Leveraging recent advances in deep learning for audio-visual emotion recognition

L Schoneveld, A Othmani, H Abdelkawy - Pattern Recognition Letters, 2021 - Elsevier

Emotional expressions are the behaviors that communicate our emotional state or attitude to
others. They are expressed through verbal and non-verbal communication. Complex human …

被引用次数：209 相关文章所有 7 个版本

[PDF] nature.com

AI models collapse when trained on recursively generated data

I Shumailov, Z Shumaylov, Y Zhao, N Papernot… - Nature, 2024 - nature.com

Stable diffusion revolutionized image creation from descriptive text. GPT-2 (ref.), GPT-3 (.
5)(ref.) and GPT-4 (ref.) demonstrated high performance across a variety of language tasks …

被引用次数：121 相关文章所有 16 个版本

[PDF] arxiv.org

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arXiv preprint arXiv …, 2022 - arxiv.org

For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

被引用次数：2318 相关文章所有 3 个版本

[PDF] neurips.cc

R-drop: Regularized dropout for neural networks

L Wu, J Li, Y Wang, Q Meng, T Qin… - Advances in …, 2021 - proceedings.neurips.cc

Dropout is a powerful and widely used technique to regularize the training of deep neural
networks. Though effective and performing well, the randomness introduced by dropout …

被引用次数：466 相关文章所有 9 个版本

[PDF] thecvf.com

Knowledge distillation with the reused teacher classifier

D Chen, JP Mei, H Zhang, C Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract Knowledge distillation aims to compress a powerful yet cumbersome teacher model
into a lightweight student model without much sacrifice of performance. For this purpose …

被引用次数：203 相关文章所有 6 个版本

[PDF] arxiv.org

Knowledge distillation: A survey

J Gou, B Yu, SJ Maybank, D Tao - International Journal of Computer Vision, 2021 - Springer

In recent years, deep neural networks have been successful in both industry and academia,
especially for computer vision tasks. The great success of deep learning is mainly due to its …

被引用次数：3113 相关文章所有 12 个版本

[PDF] arxiv.org

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arXiv preprint arXiv:2012.09816, 2020 - arxiv.org

We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

被引用次数：438 相关文章所有 4 个版本

[PDF] arxiv.org

Rethinking few-shot image classification: a good embedding is all you need?

Y Tian, Y Wang, D Krishnan, JB Tenenbaum… - Computer Vision–ECCV …, 2020 - Springer

The focus of recent meta-learning research has been on the development of learning
algorithms that can quickly adapt to test time tasks with limited data and low computational …

被引用次数：1079 相关文章所有 9 个版本

[PDF] arxiv.org

Theoretical analysis of self-training with deep networks on unlabeled data

C Wei, K Shen, Y Chen, T Ma - arXiv preprint arXiv:2010.03622, 2020 - arxiv.org

Self-training algorithms, which train a model to fit pseudolabels predicted by another
previously-learned model, have been very successful for learning with unlabeled data using …

被引用次数：250 相关文章所有 4 个版本