A comprehensive survey on model quantization for deep neural networks in image classification
Recent advancements in machine learning achieved by Deep Neural Networks (DNNs)
have been significant. While demonstrating high accuracy, DNNs are associated with a …
have been significant. While demonstrating high accuracy, DNNs are associated with a …
A survey on transformer compression
Large models based on the Transformer architecture play increasingly vital roles in artificial
intelligence, particularly within the realms of natural language processing (NLP) and …
intelligence, particularly within the realms of natural language processing (NLP) and …
Unified data-free compression: Pruning and quantization without fine-tuning
S Bai, J Chen, X Shen, Y Qian… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Structured pruning and quantization are promising approaches for reducing the inference
time and memory footprint of neural networks. However, most existing methods require the …
time and memory footprint of neural networks. However, most existing methods require the …
Towards trustworthy dataset distillation
Efficiency and trustworthiness are two eternal pursuits when applying deep learning in
practical scenarios. Considering efficiency, dataset distillation (DD) endeavors to reduce …
practical scenarios. Considering efficiency, dataset distillation (DD) endeavors to reduce …
Dual teachers for self-knowledge distillation
We introduce an efficient self-knowledge distillation framework, Dual Teachers for Self-
Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual …
Knowledge Distillation (DTSKD), where the student receives self-supervisions by dual …
MCMC: Multi-Constrained Model Compression via One-Stage Envelope Reinforcement Learning
Model compression methods are being developed to bridge the gap between the massive
scale of neural networks and the limited hardware resources on edge devices. Since most …
scale of neural networks and the limited hardware resources on edge devices. Since most …
Single-shot pruning and quantization for hardware-friendly neural network acceleration
B Jiang, J Chen, Y Liu - Engineering Applications of Artificial Intelligence, 2023 - Elsevier
Applying CNN on embedded systems is challenging due to model size limitations. Pruning
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …
and quantization can help, but are time-consuming to apply separately. Our Single-Shot …
MBQuant: A novel multi-branch topology method for arbitrary bit-width network quantization
Arbitrary bit-width network quantization has received significant attention due to its high
adaptability to various bit-width requirements during runtime. However, in this paper, we …
adaptability to various bit-width requirements during runtime. However, in this paper, we …
PIPE: Parallelized inference through ensembling of residual quantization expansions
Deep neural networks (DNNs) are ubiquitous in computer vision and natural language
processing, but suffer from high inference cost. This problem can be addressed by …
processing, but suffer from high inference cost. This problem can be addressed by …
Dynamic instance-aware layer-bit-select network on human activity recognition using wearable sensors
During recent years, deep convolutional neural networks have achieved remarkable
success in a wide range of sensor-based human activity recognition (HAR) applications …
success in a wide range of sensor-based human activity recognition (HAR) applications …