[图书][B] Advanced computer architecture: parallelism, scalability, programmability

K Hwang, N Jotwani - 1993 - academia.edu
Course Syllabus Course Title: Advanced Computer Architecture Page 1 Page 1 of 5
Philadelphia University Faculty of Information Technology Department of Computer Science …

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

G Kwasniewski, M Kabić, M Besta… - Proceedings of the …, 2019 - dl.acm.org
We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near
communication-optimal for all combinations of matrix dimensions, processor counts, and …

A three-dimensional approach to parallel matrix multiplication

RC Agarwal, SM Balle, FG Gustavson… - IBM Journal of …, 1995 - ieeexplore.ieee.org
A three-dimensional (3D) matrix multiplication algorithm for massively parallel processing
systems is presented. The P processors are configured as a “virtual” processing cube with …

Evaluating HPC networks via simulation of parallel workloads

N Jain, A Bhatele, S White, T Gamblin… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
This paper presents an evaluation and comparison of three topologies that are popular for
building interconnection networks in large-scale supercomputers: torus, fat-tree, and …

Parallel Branch‐and‐Bound Algorithms

TG Crainic, B Le Cun… - Parallel combinatorial …, 2006 - Wiley Online Library
In the beginning of the twenty-first century, large unsolved combinatorial optimization
problems have been solved exactly. Two impressive cases should be mentioned. First are …

Tensor quantization: High-dimensional data compression

SY Chang, HC Wu - IEEE Transactions on Circuits and Systems …, 2022 - ieeexplore.ieee.org
Quantization is an important technique to transform the input sample values from a large set
(or a continuous range) into the output sample values in a small set (or a finite set). It has …

Codesign tradeoffs for high-performance, low-power linear algebra architectures

A Pedram, RA Van De Geijn… - IEEE Transactions on …, 2012 - ieeexplore.ieee.org
As technology is reaching physical limits, reducing power consumption is a key issue on our
path to sustained performance. In this paper, we study fundamental tradeoffs and limits in …

Matrix multiplication on the OTIS-mesh optoelectronic computer

CF Wang, S Sahni - IEEE Transactions on Computers, 2001 - ieeexplore.ieee.org
We develop algorithms to multiply two vectors, a vector and a matrix, and two matrices on an
OTIS-Mesh optoelectronic computer. Two mappings, group row and group submesh, of a …

SRUMMA: a matrix multiplication algorithm suitable for clusters and scalable shared memory systems

M Krishnan, J Nieplocha - 18th International Parallel and …, 2004 - ieeexplore.ieee.org
Summary form only given. We describe a novel parallel algorithm that implements a dense
matrix multiplication operation with algorithmic efficiency equivalent to that of Cannon's …

Using simulation to design extremescale applications and architectures: programming model exploration

CL Janssen, H Adalsteinsson, JP Kenny - ACM SIGMETRICS …, 2011 - dl.acm.org
A key problem facing application developers is that they are expected to utilize extreme
levels of parallelism soon after delivery of future leadership class machines, but developing …