Rasa: Efficient register-aware systolic array matrix engine for cpu

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org

In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

被引用次数：22 相关文章所有 2 个版本

[PDF] mdpi.com

Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com

As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …

被引用次数：12 相关文章所有 4 个版本

[PDF] arxiv.org

Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus

G Jeong, S Damani, AR Bambhaniya… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with
several companies (Arm, Intel, IBM) announcing products with specialized matrix engines …

被引用次数：13 相关文章所有 4 个版本

[PDF] mdpi.com

GPGCN: A general-purpose graph convolution neural network accelerator based on RISC-V ISA extension

W Tang, P Zhang - Electronics, 2022 - mdpi.com

In the past two years, various graph convolution neural networks (GCNs) accelerators have
emerged, each with their own characteristics, but their common disadvantage is that the …

被引用次数：4 相关文章所有 3 个版本

CUTE: A scalable CPU-centric and Ultra-utilized Tensor Engine for convolutions

W Li, J Ye, F Zhang, T Liu, T Zhang, J Wang - Journal of Systems …, 2024 - Elsevier

Convolution is a fundamental and computationally expensive primitive and finds ubiquitous
in deep neural networks (DNNs). The evolving DNNs have spurred the emergence of …

[PDF] upc.edu

SLIDEX: A Novel Architecture for Sliding Window Processing

R Taranco, JM Arnau, A González - Proceedings of the 38th ACM …, 2024 - dl.acm.org

Efficient image processing is increasingly crucial in constrained embedded and real-time
platforms, especially in emerging applications such as Autonomous Driving (AD) or …

LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks

YC Lo, YC Tsai, RS Liu - IEEE Computer Architecture Letters, 2023 - ieeexplore.ieee.org

Computing latency is an important system metric for Deep Neural Networks (DNNs)
accelerators. To reduce latency, this work proposes LV, a latency-versatile floating-point …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Abstracting Sparse DNN Acceleration via Structured Sparse Tensor Decomposition

G Jeong, PA Tsai, AR Bambhaniya, SW Keckler… - arXiv preprint arXiv …, 2024 - arxiv.org

Exploiting sparsity in deep neural networks (DNNs) has been a promising area to meet the
growing computation need of modern DNNs. However, in practice, sparse DNN acceleration …

Efficient Convolutional Dataflows on Low-Power Neural Network Accelerators

L Orosa, S Koppula, Y Umuroglu… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

Dilated and transposed convolutions are widely used in modern convolutional neural
networks (CNNs). These kernels are used extensively during CNN training and inference of …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

All-rounder: A flexible DNN accelerator with diverse data format support

SH Noh, S Lee, B Shin, S Park, Y Jang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recognizing the explosive increase in the use of DNN-based applications, several industrial
companies developed a custom ASIC (eg, Google TPU, IBM RaPiD, Intel NNP-I/NNP-T) and …