Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

G Huang, Z Wang, PA Tsai, C Zhang, Y Ding… - Proceedings of the 56th …, 2023 - dl.acm.org
This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse
Deep Neural Networks (DNNs) with two key innovations:(1) native support for both training …

ASA: A ccelerating S parse A ccumulation in Column-wise SpGEMM

C Zhang, M Bremer, C Chan, J Shalf… - ACM Transactions on …, 2022 - dl.acm.org
Sparse linear algebra is an important kernel in many different applications. Among various
sparse general matrix-matrix multiplication (SpGEMM) algorithms, Gustavson's column-wise …

Hirac: A hierarchical accelerator with sorting-based packing for spgemms in dnn applications

H Shabani, A Singh, B Youhana… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
The state-of-the-art deep neural network (DNN) models use pruning to avoid over-fitting and
reduce the number of parameters. In order to improve storage and computational efficiency …

HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator

J Kim, M Jang, H Nam, S Kim - Proceedings of the 56th Annual IEEE …, 2023 - dl.acm.org
General sparse matrix-matrix multiplication (SpGEMM) is a memory-bound workload, due to
the compression format used. To minimize data movements for input matrices, outer product …

High-efficiency Compressor Trees for Latest AMD FPGAs

K Hoßfeld, HJ Damsgaard, J Nurmi, M Blott… - ACM Transactions on …, 2024 - dl.acm.org
High-fan-in dot product computations are ubiquitous in highly relevant application domains,
such as signal processing and machine learning. Particularly, the diverse set of data formats …

Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference

A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org
Deep neural networks (DNNs), while enormously popular, continue to place ever higher
compute demand for which GPUs provide specialized matrix multipliers called tensor cores …

Griffin: Rethinking sparse optimization for deep learning architectures

JH Shin, A Shafiee, A Pedram… - … Symposium on High …, 2022 - ieeexplore.ieee.org
This paper examines the design space trade-offs of DNNs accelerators aiming to achieve
competitive performance and efficiency metrics for all four combinations of dense or sparse …

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

K Zhong, Z Zhu, G Dai, H Wang, X Yang… - Proceedings of the 29th …, 2024 - dl.acm.org
Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …