Spade: A flexible and scalable accelerator for spmm and sddmm
The widespread use of Sparse Matrix Dense Matrix Multiplication (SpMM) and Sampled
Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for …
Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for …
Capstan: A vector RDA for sparsity
This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow
accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one …
accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one …
PolarFly: a cost-effective and flexible low-diameter topology
In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi
family of polarity graphs from finite geometry. This is the first known diameter-2 topology that …
family of polarity graphs from finite geometry. This is the first known diameter-2 topology that …
Massive data-centric parallelism in the chiplet era
Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …
search and sparse data-structure traversal, where some solutions scale up to thousands of …
FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning
Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …
Characterizing the Scalability of Graph Convolutional Networks on Intel® PIUMA
MJ Adiletta, JJ Tithi, EI Farsarakis… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems
is challenging due to a large memory footprint, sparse computational patterns, and irregular …
is challenging due to a large memory footprint, sparse computational patterns, and irregular …
Accelerating Allreduce with in-network reduction on Intel PIUMA
The Intel Programmable Integrated Unified Memory Architecture (PIUMA) system maps
collective operations directly into the network switches and supports pipelined embeddings …
collective operations directly into the network switches and supports pipelined embeddings …
DCRA: A distributed chiplet-based reconfigurable architecture for irregular applications
In recent years, the growing demand to process large graphs and sparse datasets has led to
increased research efforts to develop hardware-and software-based architectural solutions …
increased research efforts to develop hardware-and software-based architectural solutions …
SMASH: Sparse matrix atomic scratchpad hashing
K Shivdikar - 2021 - search.proquest.com
In 1812, a French mathematician named Jacques Philippe Marie Binet pointed out several
important computations involved the multiplication of two matrices [53]. On November 30 of …
important computations involved the multiplication of two matrices [53]. On November 30 of …
In-network reductions on multi-dimensional HyperX
The use of massively parallel systems for application scaling has pushed the performance
bottleneck towards data communication on the network. This is especially critical for …
bottleneck towards data communication on the network. This is especially critical for …