Scalability of parallel algorithms for matrix multiplication

K Hwang, N Jotwani - 1993 - academia.edu

Course Syllabus Course Title: Advanced Computer Architecture Page 1 Page 1 of 5
Philadelphia University Faculty of Information Technology Department of Computer Science …

被引用次数：2434 相关文章所有 8 个版本

[PDF] arxiv.org

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

G Kwasniewski, M Kabić, M Besta… - Proceedings of the …, 2019 - dl.acm.org

We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near
communication-optimal for all combinations of matrix dimensions, processor counts, and …

被引用次数：100 相关文章所有 25 个版本

[PDF] vt.edu

A three-dimensional approach to parallel matrix multiplication

RC Agarwal, SM Balle, FG Gustavson… - IBM Journal of …, 1995 - ieeexplore.ieee.org

A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing
systems is presented. The P processors are configured as a “virtual” processing cube with …

被引用次数：228 相关文章所有 10 个版本

[PDF] osti.gov

Evaluating HPC networks via simulation of parallel workloads

N Jain, A Bhatele, S White, T Gamblin… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org

This paper presents an evaluation and comparison of three topologies that are popular for
building interconnection networks in large-scale supercomputers: torus, fat-tree, and …

被引用次数：95 相关文章所有 10 个版本

[PDF] academia.edu

Parallel Branch‐and‐Bound Algorithms

TG Crainic, B Le Cun… - Parallel combinatorial …, 2006 - Wiley Online Library

In the beginning of the twenty-first century, large unsolved combinatorial optimization
problems have been solved exactly. Two impressive cases should be mentioned. First are …

被引用次数：135 相关文章所有 10 个版本

Tensor quantization: High-dimensional data compression

SY Chang, HC Wu - IEEE Transactions on Circuits and Systems …, 2022 - ieeexplore.ieee.org

Quantization is an important technique to transform the input sample values from a large set
(or a continuous range) into the output sample values in a small set (or a finite set). It has …

被引用次数：14 相关文章所有 3 个版本

[PDF] utexas.edu

Codesign tradeoffs for high-performance, low-power linear algebra architectures

A Pedram, RA Van De Geijn… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org

As technology is reaching physical limits, reducing power consumption is a key issue on our
path to sustained performance. In this paper, we study fundamental tradeoffs and limits in …

被引用次数：76 相关文章所有 9 个版本

Matrix multiplication on the OTIS-mesh optoelectronic computer

CF Wang, S Sahni - IEEE Transactions on Computers, 2001 - ieeexplore.ieee.org

We develop algorithms to multiply two vectors, a vector and a matrix, and two matrices on an
OTIS-Mesh optoelectronic computer. Two mappings, group row and group submesh, of a …

被引用次数：104 相关文章所有 8 个版本

[PDF] psu.edu

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

M Krishnan, J Nieplocha - 18th International Parallel and …, 2004 - ieeexplore.ieee.org

Summary form only given. We describe a novel parallel algorithm that implements a dense
matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's …

被引用次数：79 相关文章所有 10 个版本

Using simulation to design extremescale applications and architectures: programming model exploration

CL Janssen, H Adalsteinsson, JP Kenny - ACM SIGMETRICS …, 2011 - dl.acm.org

A key problem facing application developers is that they are expected to utilize extreme
levels of parallelism soon after delivery of future leadership class machines, but developing …

被引用次数：57 相关文章所有 3 个版本