PIUMA: programmable integrated unified memory architecture

G Gerogiannis, S Yesil, D Lenadora, D Cao… - Proceedings of the 50th …, 2023 - dl.acm.org

The widespread use of Sparse Matrix Dense Matrix Multiplication (SpMM) and Sampled
Dense Matrix Dense Matrix Multiplication (SDDMM) kernels makes them candidates for …

被引用次数：17 相关文章所有 6 个版本

[PDF] acm.org

Capstan: A vector RDA for sparsity

A Rucker, M Vilim, T Zhao, Y Zhang… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow
accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one …

被引用次数：37 相关文章所有 3 个版本

[PDF] arxiv.org

PolarFly: a cost-effective and flexible low-diameter topology

K Lakhotia, M Besta, L Monroe, K Isham… - … Conference for High …, 2022 - ieeexplore.ieee.org

In this paper we present PolarFly, a diameter-2 network topology based on the Erdos-Renyi
family of polarity graphs from finite geometry. This is the first known diameter-2 topology that …

被引用次数：20 相关文章所有 24 个版本

[PDF] arxiv.org

Massive data-centric parallelism in the chiplet era

M Orenes-Vera, E Tureci, D Wentzlaff… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent works have introduced task-based parallelization schemes to accelerate graph
search and sparse data-structure traversal, where some solutions scale up to thousands of …

被引用次数：8 相关文章所有 2 个版本

[PDF] acm.org

FEASTA: A Flexible and Efficient Accelerator for Sparse Tensor Algebra in Machine Learning

K Zhong, Z Zhu, G Dai, H Wang, X Yang… - Proceedings of the 29th …, 2024 - dl.acm.org

Recently, sparse tensor algebra (SpTA) plays an increasingly important role in machine
learning. However, due to the unstructured sparsity of SpTA, the general-purpose …

被引用次数：3 相关文章所有 4 个版本

[PDF] google.com

Characterizing the Scalability of Graph Convolutional Networks on Intel^® PIUMA

MJ Adiletta, JJ Tithi, EI Farsarakis… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org

Large-scale Graph Convolutional Network (GCN) inference on traditional CPU/GPU systems
is challenging due to a large memory footprint, sparse computational patterns, and irregular …

被引用次数：8 相关文章所有 3 个版本

[PDF] ieee.org

Accelerating Allreduce with in-network reduction on Intel PIUMA

K Lakhotia, F Petrini, R Kannan, V Prasanna - IEEE Micro, 2021 - ieeexplore.ieee.org

The Intel Programmable Integrated Unified Memory Architecture (PIUMA) system maps
collective operations directly into the network switches and supports pipelined embeddings …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

DCRA: A distributed chiplet-based reconfigurable architecture for irregular applications

M Orenes-Vera, E Tureci, M Martonosi… - arXiv preprint arXiv …, 2023 - arxiv.org

In recent years, the growing demand to process large graphs and sparse datasets has led to
increased research efforts to develop hardware-and software-based architectural solutions …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

SMASH: Sparse matrix atomic scratchpad hashing

K Shivdikar - 2021 - search.proquest.com

In 1812, a French mathematician named Jacques Philippe Marie Binet pointed out several
important computations involved the multiplication of two matrices [53]. On November 30 of …

被引用次数：10 相关文章所有 8 个版本

In-network reductions on multi-dimensional HyperX

K Lakhotia, F Petrini, R Kannan… - 2021 IEEE Symposium …, 2021 - ieeexplore.ieee.org

The use of massively parallel systems for application scaling has pushed the performance
bottleneck towards data communication on the network. This is especially critical for …

被引用次数：7 相关文章所有 2 个版本