Spade: A flexible and scalable accelerator for spmm and sddmm

G Gerogiannis, S Yesil, D Lenadora, D Cao… - Proceedings of the 50th …, 2023 - dl.acm.org
The widespread use of Sparse Matrix Dense Matrix Multiplication (SpMM) and Sampled
Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for …

Capstan: A vector RDA for sparsity

A Rucker, M Vilim, T Zhao, Y Zhang… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow
accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one …

PolarFly: a cost-effective and flexible low-diameter topology

K Lakhotia, M Besta, L Monroe, K Isham… - … Conference for High …, 2022 - ieeexplore.ieee.org
In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi
family of polarity graphs from finite geometry. This is the first known diameter-2 topology that …

Massive data-centric parallelism in the chiplet era

M Orenes-Vera, E Tureci, D Wentzlaff… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

K Zhong, Z Zhu, G Dai, H Wang, X Yang… - Proceedings of the 29th …, 2024 - dl.acm.org
Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …

Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA

MJ Adiletta, JJ Tithi, EI Farsarakis… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems
is challenging due to a large memory footprint, sparse computational patterns, and irregular …

Accelerating Allreduce with in-network reduction on Intel PIUMA

K Lakhotia, F Petrini, R Kannan, V Prasanna - IEEE Micro, 2021 - ieeexplore.ieee.org
The Intel Programmable Integrated Unified Memory Architecture (PIUMA) system maps
collective operations directly into the network switches and supports pipelined embeddings …

DCRA: A distributed chiplet-based reconfigurable architecture for irregular applications

M Orenes-Vera, E Tureci, M Martonosi… - arXiv preprint arXiv …, 2023 - arxiv.org
In recent years, the growing demand to process large graphs and sparse datasets has led to
increased research efforts to develop hardware-and software-based architectural solutions …

SMASH: Sparse matrix atomic scratchpad hashing

K Shivdikar - 2021 - search.proquest.com
In 1812, a French mathematician named Jacques Philippe Marie Binet pointed out several
important computations involved the multiplication of two matrices [53]. On November 30 of …

In-network reductions on multi-dimensional HyperX

K Lakhotia, F Petrini, R Kannan… - 2021 IEEE Symposium …, 2021 - ieeexplore.ieee.org
The use of massively parallel systems for application scaling has pushed the performance
bottleneck towards data communication on the network. This is especially critical for …