Pruning and quantization for deep neural network acceleration: A survey
T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …
abilities in the field of computer vision. However, complex network architectures challenge …
Recu: Reviving the dead weights in binary neural networks
Binary neural networks (BNNs) have received increasing attention due to their superior
reductions of computation and memory. Most existing works focus on either lessening the …
reductions of computation and memory. Most existing works focus on either lessening the …
Mandheling: Mixed-precision on-device dnn training with dsp offloading
This paper proposes Mandheling, the first system that enables highly resource-efficient on-
device training by orchestrating mixed-precision training with on-chip Digital Signal …
device training by orchestrating mixed-precision training with on-chip Digital Signal …
Cambricon-Q: A hybrid architecture for efficient training
Deep neural network (DNN) training is notoriously time-consuming, and quantization is
promising to improve the training efficiency with reduced bandwidth/storage requirements …
promising to improve the training efficiency with reduced bandwidth/storage requirements …
Siman: Sign-to-magnitude network binarization
Binary neural networks (BNNs) have attracted broad research interest due to their efficient
storage and computational ability. Nevertheless, a significant challenge of BNNs lies in …
storage and computational ability. Nevertheless, a significant challenge of BNNs lies in …
Acceleration unit for a deep learning engine
Embodiments of a device include an integrated circuit, a reconfigurable stream switch
formed in the integrated circuit along with a plurality of convolution accelerators and an …
formed in the integrated circuit along with a plurality of convolution accelerators and an …
Arithmetic unit for deep learning acceleration
(57) APSTRACT Embodiments of a device include an integrated circuit, a reconfigurable
stream switch formed in the integrated circuit, and an arithmetic unit coupled to the …
stream switch formed in the integrated circuit, and an arithmetic unit coupled to the …
Superbnn: Randomized binary neural network using adiabatic superconductor josephson devices
Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high
energy efficiency. By employing the distinct polarity of current to denote logic '0'and '1', AQFP …
energy efficiency. By employing the distinct polarity of current to denote logic '0'and '1', AQFP …
Accurate neural training with 4-bit matrix multiplications at standard formats
Quantization of the weights and activations is one of the main methods to reduce the
computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4 …
computational footprint of Deep Neural Networks (DNNs) training. Current methods enable 4 …
Fantastic4: A hardware-software co-design approach for efficiently running 4bit-compact multilayer perceptrons
S Wiedemann, S Shivapakash… - IEEE Open Journal …, 2021 - ieeexplore.ieee.org
With the growing demand for deploying Deep Learning models to the “edge”, it is paramount
to develop techniques that allow to execute state-of-the-art models within very tight and …
to develop techniques that allow to execute state-of-the-art models within very tight and …