Collaborative multi-teacher knowledge distillation for learning low bit-width deep neural networks

MJPNet-S*: Multistyle Joint-Perception Network with Knowledge Distillation for Drone RGB-Thermal Crowd Density Estimation in Smart Cities

W Zhou, X Yang, X Dong, M Fang… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org

Crowd density estimation has gained significant research interest owing to its potential in
various industries and social applications. Therefore, this article proposes a multistyle joint …

被引用次数：7 相关文章

[PDF] thecvf.com

Frequency attention for knowledge distillation

C Pham, VA Nguyen, T Le, D Phung… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Knowledge distillation is an attractive approach for learning compact deep neural
networks, which learns a lightweight student model by distilling knowledge from a complex …

被引用次数：5 相关文章所有 5 个版本

Multi-dataset fusion for multi-task learning on face attribute recognition

H Lu, S Xu, J Wang - Pattern Recognition Letters, 2023 - Elsevier

The goal of face attribute recognition (FAR) is to recognize the attributes of face images,
such as gender, race, etc. Multi-dataset fusion aims to train a network with multiple datasets …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Self-Supervised Quantization-Aware Knowledge Distillation

K Zhao, M Zhao - arXiv preprint arXiv:2403.11106, 2024 - arxiv.org

Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve
competitive performance in creating low-bit deep learning models. However, existing works …

被引用次数：1 相关文章所有 4 个版本

PURF: Improving teacher representations by imposing smoothness constraints for knowledge distillation

MI Hossain, S Akhter, CS Hong, EN Huh - Applied Soft Computing, 2024 - Elsevier

Abstract Knowledge distillation is one of the most persuasive approaches to model
compression that transfers the representational expertise from large deep-learning teacher …

被引用次数：1 相关文章所有 2 个版本

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation

L Ma, M Sun, Z Shen - arXiv preprint arXiv:2407.07093, 2024 - arxiv.org

This work presents a Fully BInarized Large Language Model (FBI-LLM), demonstrating for
the first time how to train a large-scale binary language model from scratch (not the partial …