Spaghetti: Streaming accelerators for highly sparse gemm on fpgas

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org

Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

被引用次数：37 相关文章

[PDF] acm.org

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org

Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

被引用次数：45 相关文章所有 9 个版本

[PDF] acm.org

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

G Huang, Z Wang, PA Tsai, C Zhang, Y Ding… - Proceedings of the 56th …, 2023 - dl.acm.org

This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse
Deep Neural Networks (DNNs) with two key innovations:(1) native support for both training …

被引用次数：8 相关文章所有 3 个版本

[PDF] acm.org Full View

ASA: A ccelerating S parse A ccumulation in Column-wise SpGEMM

C Zhang, M Bremer, C Chan, J Shalf… - ACM Transactions on …, 2022 - dl.acm.org

Sparse linear algebra is an important kernel in many different applications. Among various
sparse general matrix-matrix multiplication (SpGEMM) algorithms, Gustavson's column-wise …

被引用次数：14 相关文章所有 5 个版本

[PDF] nsf.gov

Hirac: A hierarchical accelerator with sorting-based packing for spgemms in dnn applications

H Shabani, A Singh, B Youhana… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

The state-of-the-art deep neural network (DNN) models use pruning to avoid over-fitting and
reduce the number of parameters. In order to improve storage and computational efficiency …

被引用次数：11 相关文章所有 3 个版本

HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator

J Kim, M Jang, H Nam, S Kim - Proceedings of the 56th Annual IEEE …, 2023 - dl.acm.org

General sparse matrix-matrix multiplication (SpGEMM) is a memory-bound workload, due to
the compression format used. To minimize data movements for input matrices, outer product …

被引用次数：1 相关文章所有 4 个版本

[PDF] acm.org

High-efficiency Compressor Trees for Latest AMD FPGAs

K Hoßfeld, HJ Damsgaard, J Nurmi, M Blott… - ACM Transactions on …, 2024 - dl.acm.org

High-fan-in dot product computations are ubiquitous in highly relevant application domains,
such as signal processing and machine learning. Particularly, the diverse set of data formats …

被引用次数：2 相关文章

[PDF] acm.org

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org

Deep neural networks (DNNs), while enormously popular, continue to place ever higher
compute demand for which GPUs provide specialized matrix multipliers called tensor cores …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Griffin: Rethinking sparse optimization for deep learning architectures

JH Shin, A Shafiee, A Pedram… - … Symposium on High …, 2022 - ieeexplore.ieee.org

This paper examines the design space trade-offs of DNNs accelerators aiming to achieve
competitive performance and efficiency metrics for all four combinations of dense or sparse …

被引用次数：15 相关文章所有 5 个版本

[PDF] acm.org

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

K Zhong, Z Zhu, G Dai, H Wang, X Yang… - Proceedings of the 29th …, 2024 - dl.acm.org

Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …

被引用次数：3 相关文章所有 4 个版本