Data-free quantization via mixed-precision compensation without fine-tuning

B Rokh, A Azarpeyvand, A Khanteymoori - ACM Transactions on …, 2023 - dl.acm.org

Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …

被引用次数：68 相关文章

[PDF] arxiv.org

A survey on transformer compression

Y Tang, Y Wang, J Guo, Z Tu, K Han, H Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …

被引用次数：24 相关文章所有 2 个版本

[PDF] thecvf.com

Unified data-free compression: Pruning and quantization without fine-tuning

S Bai, J Chen, X Shen, Y Qian… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Structured pruning and quantization are promising approaches for reducing the inference
time and memory footprint of neural networks. However, most existing methods require the …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Towards trustworthy dataset distillation

S Ma, F Zhu, Z Cheng, XY Zhang - Pattern Recognition, 2025 - Elsevier

Efficiency and trustworthiness are two eternal pursuits when applying deep learning in
practical scenarios. Considering efficiency, dataset distillation (DD) endeavors to reduce …

被引用次数：10 相关文章所有 2 个版本

Dual teachers for self-knowledge distillation

Z Li, X Li, L Yang, R Song, J Yang, Z Pan - Pattern Recognition, 2024 - Elsevier

We introduce an efficient self-knowledge distillation framework, Dual Teachers for Self-
Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual …

被引用次数：11 相关文章所有 2 个版本

MCMC: Multi-Constrained Model Compression via One-Stage Envelope Reinforcement Learning

S Li, J Chen, S Liu, C Zhu, G Tian… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Model compression methods are being developed to bridge the gap between the massive
scale of neural networks and the limited hardware resources on edge devices. Since most …

被引用次数：2 相关文章所有 3 个版本

[PDF] zju.edu.cn

Single-shot pruning and quantization for hardware-friendly neural network acceleration

B Jiang, J Chen, Y Liu - Engineering Applications of Artificial Intelligence, 2023 - Elsevier

Applying CNN on embedded systems is challenging due to model size limitations. Pruning
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …

被引用次数：2 相关文章所有 3 个版本

MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization

Y Zhong, Y Zhou, F Chao, R Ji - Pattern Recognition, 2025 - Elsevier

Arbitrary bit-width network quantization has received significant attention due to its high
adaptability to various bit-width requirements during runtime. However, in this paper, we …

PIPE: Parallelized inference through ensembling of residual quantization expansions

E Yvinec, A Dapogny, K Bailly - Pattern Recognition, 2024 - Elsevier

Deep neural networks (DNNs) are ubiquitous in computer vision and natural language
processing, but suffer from high inference cost. This problem can be addressed by …

被引用次数：1 相关文章所有 3 个版本

Dynamic instance-aware layer-bit-select network on human activity recognition using wearable sensors

N Ye, L Zhang, D Cheng, C Bu, S Sun, H Wu… - … Applications of Artificial …, 2024 - Elsevier

During recent years, deep convolutional neural networks have achieved remarkable
success in a wide range of sensor-based human activity recognition (HAR) applications …