Communication lower bounds and optimal algorithms for numerical linear algebra

G Ballard, E Carson, J Demmel, M Hoemmen… - Acta Numerica, 2014 - cambridge.org
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …

Mesh-tensorflow: Deep learning for supercomputers

N Shazeer, Y Cheng, N Parmar… - Advances in neural …, 2018 - proceedings.neurips.cc
Abstract Batch-splitting (data-parallelism) is the dominant distributed Deep Neural Network
(DNN) training strategy, due to its universal applicability and its amenability to Single …

A systematic survey of general sparse matrix-matrix multiplication

J Gao, W Ji, F Chang, S Han, B Wei, Z Liu… - ACM Computing …, 2023 - dl.acm.org
General Sparse Matrix-Matrix Multiplication (SpGEMM) has attracted much attention from
researchers in graph analyzing, scientific computing, and deep learning. Many optimization …

Mathematical foundations of the GraphBLAS

J Kepner, P Aaltonen, D Bader, A Buluç… - 2016 IEEE High …, 2016 - ieeexplore.ieee.org
The GraphBLAS standard (GraphBlas. org) is being developed to bring the potential of
matrix-based graph algorithms to the broadest possible audience. Mathematically, the …

Sparse matrix multiplication: The distributed block-compressed sparse row library

U Borštnik, J VandeVondele, V Weber, J Hutter - Parallel Computing, 2014 - Elsevier
Efficient parallel multiplication of sparse matrices is key to enabling many large-scale
calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) …

TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

Y Niu, Z Lu, H Ji, S Song, Z Jin, W Liu - Proceedings of the 27th ACM …, 2022 - dl.acm.org
Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …

Parallel triangle counting and enumeration using matrix algebra

A Azad, A Buluç, J Gilbert - 2015 IEEE International Parallel …, 2015 - ieeexplore.ieee.org
Triangle counting and enumeration are important kernels that are used to characterize
graphs. They are also used to compute important statistics such as clustering coefficients …

An efficient GPU general sparse matrix-matrix multiplication for irregular data

W Liu, B Vinter - 2014 IEEE 28th international parallel and …, 2014 - ieeexplore.ieee.org
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for
numerous applications such as algebraic multigrid method, breadth first search and shortest …

Communication-efficient jaccard similarity for high-performance distributed genome comparisons

M Besta, R Kanakagiri, H Mustafa… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
The Jaccard similarity index is an important measure of the overlap of two sets, widely used
in machine learning, computational genomics, information retrieval, and many other areas …

Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication

A Azad, G Ballard, A Buluc, J Demmel, L Grigori… - SIAM Journal on …, 2016 - SIAM
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-
performance graph algorithms as well as for some linear solvers, such as algebraic …