Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

Snap: An efficient sparse neural acceleration processor for unstructured sparse deep neural network inference

JF Zhang, CE Lee, C Liu, YS Shao… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
Recent developments in deep neural network (DNN) pruning introduces data sparsity to
enable deep learning applications to run more efficiently on resourceand energy …

Spada: Accelerating sparse matrix multiplication with adaptive dataflow

Z Li, J Li, T Chen, D Niu, H Zheng, Y Xie… - Proceedings of the 28th …, 2023 - dl.acm.org
Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep
learning applications. The highly irregular structures of SpGEMM limit its performance and …

Hardware accelerator design for sparse dnn inference and training: A tutorial

W Mao, M Wang, X Xie, X Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) are widely used in many fields, such as artificial intelligence
generated content (AIGC) and robotics. To efficiently support these tasks, the model pruning …

Z-PIM: A sparsity-aware processing-in-memory architecture with fully variable weight bit-precision for energy-efficient deep neural networks

JH Kim, J Lee, J Lee, J Heo… - IEEE Journal of Solid-State …, 2021 - ieeexplore.ieee.org
We present an energy-efficient processing-in-memory (PIM) architecture named Z-PIM that
supports both sparsity handling and fully variable bit-precision in weight data for energy …

CNN inference using a preprocessing precision controller and approximate multipliers with various precisions

I Hammad, L Li, K El-Sankary, WM Snelgrove - IEEE Access, 2021 - ieeexplore.ieee.org
This article proposes boosting the multiplication performance for convolutional neural
network (CNN) inference using a precision prediction preprocessor which controls various …

Memory-efficient CNN accelerator based on interlayer feature map compression

Z Shao, X Chen, L Du, L Chen, Y Du… - … on Circuits and …, 2021 - ieeexplore.ieee.org
Existing deep convolutional neural networks (CNNs) generate massive interlayer feature
data during network inference. To maintain real-time processing in embedded systems …

An efficient unstructured sparse convolutional neural network accelerator for wearable ECG classification device

J Lu, D Liu, X Cheng, L Wei, A Hu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Convolution neural network (CNN) with pruning techniques has shown remarkable
prospects in electrocardiogram (ECG) classification. However, efficiently deploying the …

CUTIE: Beyond PetaOp/s/W ternary DNN inference acceleration with better-than-binary energy efficiency

M Scherer, G Rutishauser, L Cavigelli… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
We present a 3.1 POp/s/W fully digital hardware accelerator for ternary neural networks
(TNNs). CUTIE, the completely unrolled ternary inference engine, focuses on minimizing …

Trainer: An energy-efficient edge-device training processor supporting dynamic weight pruning

Y Wang, Y Qin, D Deng, J Wei, T Chen… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
Transfer learning, which transfers knowledge from source datasets to target datasets, is
practical for adaptive deep neural network (DNN) applications. When considering user …