Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction
Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …
performance. However, the high cost of computation and memory footprint make its …
Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …
applications including scientific computing, graph processing, and deep learning …
RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration
This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse
Deep Neural Networks (DNNs) with two key innovations:(1) native support for both training …
Deep Neural Networks (DNNs) with two key innovations:(1) native support for both training …
ASA: A ccelerating S parse A ccumulation in Column-wise SpGEMM
Sparse linear algebra is an important kernel in many different applications. Among various
sparse general matrix-matrix multiplication (SpGEMM) algorithms, Gustavson's column-wise …
sparse general matrix-matrix multiplication (SpGEMM) algorithms, Gustavson's column-wise …
Hirac: A hierarchical accelerator with sorting-based packing for spgemms in dnn applications
The state-of-the-art deep neural network (DNN) models use pruning to avoid over-fitting and
reduce the number of parameters. In order to improve storage and computational efficiency …
reduce the number of parameters. In order to improve storage and computational efficiency …
HARP: Hardware-Based Pseudo-Tiling for Sparse Matrix Multiplication Accelerator
General sparse matrix-matrix multiplication (SpGEMM) is a memory-bound workload, due to
the compression format used. To minimize data movements for input matrices, outer product …
the compression format used. To minimize data movements for input matrices, outer product …
High-efficiency Compressor Trees for Latest AMD FPGAs
High-fan-in dot product computations are ubiquitous in highly relevant application domains,
such as signal processing and machine learning. Particularly, the diverse set of data formats …
such as signal processing and machine learning. Particularly, the diverse set of data formats …
Eureka: Efficient Tensor Cores for One-sided Unstructured Sparsity in DNN Inference
A Gondimalla, M Thottethodi… - Proceedings of the 56th …, 2023 - dl.acm.org
Deep neural networks (DNNs), while enormously popular, continue to place ever higher
compute demand for which GPUs provide specialized matrix multipliers called tensor cores …
compute demand for which GPUs provide specialized matrix multipliers called tensor cores …
Griffin: Rethinking sparse optimization for deep learning architectures
This paper examines the design space trade-offs of DNNs accelerators aiming to achieve
competitive performance and efficiency metrics for all four combinations of dense or sparse …
competitive performance and efficiency metrics for all four combinations of dense or sparse …
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning
Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …