A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …

Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus

G Jeong, S Damani, AR Bambhaniya… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with
several companies (Arm, Intel, IBM) announcing products with specialized matrix engines …

GPGCN: A general-purpose graph convolution neural network accelerator based on RISC-V ISA extension

W Tang, P Zhang - Electronics, 2022 - mdpi.com
In the past two years, various graph convolution neural networks (GCNs) accelerators have
emerged, each with their own characteristics, but their common disadvantage is that the …

CUTE: A scalable CPU-centric and Ultra-utilized Tensor Engine for convolutions

W Li, J Ye, F Zhang, T Liu, T Zhang, J Wang - Journal of Systems …, 2024 - Elsevier
Convolution is a fundamental and computationally expensive primitive and finds ubiquitous
in deep neural networks (DNNs). The evolving DNNs have spurred the emergence of …

SLIDEX: A Novel Architecture for Sliding Window Processing

R Taranco, JM Arnau, A González - Proceedings of the 38th ACM …, 2024 - dl.acm.org
Efficient image processing is increasingly crucial in constrained embedded and real-time
platforms, especially in emerging applications such as Autonomous Driving (AD) or …

LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks

YC Lo, YC Tsai, RS Liu - IEEE Computer Architecture Letters, 2023 - ieeexplore.ieee.org
Computing latency is an important system metric for Deep Neural Networks (DNNs)
accelerators. To reduce latency, this work proposes LV, a latency-versatile floating-point …

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

G Jeong, PA Tsai, AR Bambhaniya, SW Keckler… - arXiv preprint arXiv …, 2024 - arxiv.org
Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the
growing computation need of modern DNNs. However, in practice, sparse DNN acceleration …

Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

L Orosa, S Koppula, Y Umuroglu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
Dilated and transposed convolutions are widely used in modern convolutional neural
networks (CNNs). These kernels are used extensively during CNN training and inference of …

All-rounder: A flexible DNN accelerator with diverse data format support

SH Noh, S Lee, B Shin, S Park, Y Jang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recognizing the explosive increase in the use of DNN-based applications, several industrial
companies developed a custom ASIC (eg, Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and …